Compositional Generalization in Image Captioning

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Dokumenter

OA-Compositional Generalization in Image Captioning
807 KB, PDF-dokument

Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Elliott, Desmond

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

Originalsprog	Engelsk
Titel	Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
Antal sider	12
Forlag	Association for Computational Linguistics
Publikationsdato	1 nov. 2019
Sider	87-98
DOI	https://doi.org/10.18653/v1/K19-1009
Status	Udgivet - 1 nov. 2019
Begivenhed	23rd Conference on Computational Natural Language Learning - Hong Kong, Kina Varighed: 3 nov. 2019 → 4 nov. 2019

Konference

Konference	23rd Conference on Computational Natural Language Learning
Land	Kina
By	Hong Kong
Periode	03/11/2019 → 04/11/2019

Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk

Ingen data tilgængelig

ID: 230849989