Descriptive Attributes for Language-Based Object Keypoint Detection

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Standard

Descriptive Attributes for Language-Based Object Keypoint Detection. / Weinman, Jerod; Belongie, Serge; Frank, Stella.

Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. ed. / Henrik I. Christensen; Peter Corke; Renaud Detry; Jean-Baptiste Weibel; Markus Vincze. Springer, 2023. p. 444-458 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 14253 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review

Harvard

Weinman, J, Belongie, S & Frank, S 2023, Descriptive Attributes for Language-Based Object Keypoint Detection. in HI Christensen, P Corke, R Detry, J-B Weibel & M Vincze (eds), Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. Springer, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14253 LNCS, pp. 444-458, 14th International Conference on Computer Vision Systems, ICVS 2023, VIenna, Austria, 27/09/2023. https://doi.org/10.1007/978-3-031-44137-0_37

APA

Weinman, J., Belongie, S., & Frank, S. (2023). Descriptive Attributes for Language-Based Object Keypoint Detection. In H. I. Christensen, P. Corke, R. Detry, J-B. Weibel, & M. Vincze (Eds.), Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings (pp. 444-458). Springer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 14253 LNCS https://doi.org/10.1007/978-3-031-44137-0_37

Vancouver

Weinman J, Belongie S, Frank S. Descriptive Attributes for Language-Based Object Keypoint Detection. In Christensen HI, Corke P, Detry R, Weibel J-B, Vincze M, editors, Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. Springer. 2023. p. 444-458. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 14253 LNCS). https://doi.org/10.1007/978-3-031-44137-0_37

Author

Weinman, Jerod ; Belongie, Serge ; Frank, Stella. / Descriptive Attributes for Language-Based Object Keypoint Detection. Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. editor / Henrik I. Christensen ; Peter Corke ; Renaud Detry ; Jean-Baptiste Weibel ; Markus Vincze. Springer, 2023. pp. 444-458 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 14253 LNCS).

Bibtex

@inproceedings{7d94e5c6ff6a465a800c82107058a88f,

title = "Descriptive Attributes for Language-Based Object Keypoint Detection",

abstract = "Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet{\textquoteright}s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.",

keywords = "Attributes, Keypoint detection, Vision & language models",

author = "Jerod Weinman and Serge Belongie and Stella Frank",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.; 14th International Conference on Computer Vision Systems, ICVS 2023 ; Conference date: 27-09-2023 Through 29-09-2023",

year = "2023",

doi = "10.1007/978-3-031-44137-0_37",

language = "English",

isbn = "9783031441363",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "444--458",

editor = "Christensen, {Henrik I.} and Peter Corke and Renaud Detry and Jean-Baptiste Weibel and Markus Vincze",

booktitle = "Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings",

address = "Switzerland",

}

RIS

TY - GEN

T1 - Descriptive Attributes for Language-Based Object Keypoint Detection

AU - Weinman, Jerod

AU - Belongie, Serge

AU - Frank, Stella

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.

PY - 2023

Y1 - 2023

N2 - Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.

AB - Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.

KW - Attributes

KW - Keypoint detection

KW - Vision & language models

UR - http://www.scopus.com/inward/record.url?scp=85174519994&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-44137-0_37

DO - 10.1007/978-3-031-44137-0_37

M3 - Article in proceedings

AN - SCOPUS:85174519994

SN - 9783031441363

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 444

EP - 458

BT - Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings

A2 - Christensen, Henrik I.

A2 - Corke, Peter

A2 - Detry, Renaud

A2 - Weibel, Jean-Baptiste

A2 - Vincze, Markus

PB - Springer

T2 - 14th International Conference on Computer Vision Systems, ICVS 2023

Y2 - 27 September 2023 through 29 September 2023

ER -

ID: 372615567

Forskning