Standard
Descriptive Attributes for Language-Based Object Keypoint Detection. / Weinman, Jerod; Belongie, Serge; Frank, Stella.
Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. ed. / Henrik I. Christensen; Peter Corke; Renaud Detry; Jean-Baptiste Weibel; Markus Vincze. Springer, 2023. p. 444-458 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 14253 LNCS).
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
Weinman, J
, Belongie, S & Frank, S 2023,
Descriptive Attributes for Language-Based Object Keypoint Detection. in HI Christensen, P Corke, R Detry, J-B Weibel & M Vincze (eds),
Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. Springer, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14253 LNCS, pp. 444-458, 14th International Conference on Computer Vision Systems, ICVS 2023, VIenna, Austria,
27/09/2023.
https://doi.org/10.1007/978-3-031-44137-0_37
APA
Weinman, J.
, Belongie, S., & Frank, S. (2023).
Descriptive Attributes for Language-Based Object Keypoint Detection. In H. I. Christensen, P. Corke, R. Detry, J-B. Weibel, & M. Vincze (Eds.),
Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings (pp. 444-458). Springer. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 14253 LNCS
https://doi.org/10.1007/978-3-031-44137-0_37
Vancouver
Weinman J
, Belongie S, Frank S.
Descriptive Attributes for Language-Based Object Keypoint Detection. In Christensen HI, Corke P, Detry R, Weibel J-B, Vincze M, editors, Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. Springer. 2023. p. 444-458. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 14253 LNCS).
https://doi.org/10.1007/978-3-031-44137-0_37
Author
Weinman, Jerod ; Belongie, Serge ; Frank, Stella. / Descriptive Attributes for Language-Based Object Keypoint Detection. Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings. editor / Henrik I. Christensen ; Peter Corke ; Renaud Detry ; Jean-Baptiste Weibel ; Markus Vincze. Springer, 2023. pp. 444-458 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 14253 LNCS).
Bibtex
@inproceedings{7d94e5c6ff6a465a800c82107058a88f,
title = "Descriptive Attributes for Language-Based Object Keypoint Detection",
abstract = "Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet{\textquoteright}s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.",
keywords = "Attributes, Keypoint detection, Vision & language models",
author = "Jerod Weinman and Serge Belongie and Stella Frank",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.; 14th International Conference on Computer Vision Systems, ICVS 2023 ; Conference date: 27-09-2023 Through 29-09-2023",
year = "2023",
doi = "10.1007/978-3-031-44137-0_37",
language = "English",
isbn = "9783031441363",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer",
pages = "444--458",
editor = "Christensen, {Henrik I.} and Peter Corke and Renaud Detry and Jean-Baptiste Weibel and Markus Vincze",
booktitle = "Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings",
address = "Switzerland",
}
RIS
TY - GEN
T1 - Descriptive Attributes for Language-Based Object Keypoint Detection
AU - Weinman, Jerod
AU - Belongie, Serge
AU - Frank, Stella
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023.
PY - 2023
Y1 - 2023
N2 - Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.
AB - Multimodal vision and language (VL) models have recently shown strong performance in phrase grounding and object detection for both zero-shot and finetuned cases. We adapt a VL model (GLIP) for keypoint detection and evaluate on NABirds keypoints. Our language-based keypoints-as-objects detector GLIP-KP outperforms baseline top-down keypoint detection models based on heatmaps and allows for zero- and few-shot evaluation. When fully trained, enhancing the keypoint names with descriptive attributes gives a significant performance boost, raising AP by as much as 6.0, compared to models without attribute information. Our model exceeds heatmap-based HRNet’s AP by 4.4 overall and 8.4 on keypoints with attributes. With limited data, attributes raise zero-/one-/few-shot test AP by 1.0/3.4/1.6, respectively, on keypoints with attributes.
KW - Attributes
KW - Keypoint detection
KW - Vision & language models
UR - http://www.scopus.com/inward/record.url?scp=85174519994&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-44137-0_37
DO - 10.1007/978-3-031-44137-0_37
M3 - Article in proceedings
AN - SCOPUS:85174519994
SN - 9783031441363
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 444
EP - 458
BT - Computer Vision Systems - 14th International Conference, ICVS 2023, Proceedings
A2 - Christensen, Henrik I.
A2 - Corke, Peter
A2 - Detry, Renaud
A2 - Weibel, Jean-Baptiste
A2 - Vincze, Markus
PB - Springer
T2 - 14th International Conference on Computer Vision Systems, ICVS 2023
Y2 - 27 September 2023 through 29 September 2023
ER -