Do end-to-end speech recognition models care about context?

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Standard

Do end-to-end speech recognition models care about context? / Borgholt, Lasse; Havtorn, Jakob D.; Agic, Željko; Søgaard, Anders; Maaløe, Lars; Igel, Christian.

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Bind 2020-October International Speech Communication Association (ISCA), 2020. s. 4352-4356.

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Harvard

Borgholt, L, Havtorn, JD, Agic, Ž, Søgaard, A, Maaløe, L & Igel, C 2020, Do end-to-end speech recognition models care about context? i Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. bind 2020-October, International Speech Communication Association (ISCA), s. 4352-4356, 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, Shanghai, Kina, 25/10/2020. https://doi.org/10.21437/Interspeech.2020-1750

APA

Borgholt, L., Havtorn, J. D., Agic, Ž., Søgaard, A., Maaløe, L., & Igel, C. (2020). Do end-to-end speech recognition models care about context? I Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (Bind 2020-October, s. 4352-4356). International Speech Communication Association (ISCA). https://doi.org/10.21437/Interspeech.2020-1750

Vancouver

Borgholt L, Havtorn JD, Agic Ž, Søgaard A, Maaløe L, Igel C. Do end-to-end speech recognition models care about context? I Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Bind 2020-October. International Speech Communication Association (ISCA). 2020. s. 4352-4356 https://doi.org/10.21437/Interspeech.2020-1750

Author

Borgholt, Lasse ; Havtorn, Jakob D. ; Agic, Željko ; Søgaard, Anders ; Maaløe, Lars ; Igel, Christian. / Do end-to-end speech recognition models care about context?. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. Bind 2020-October International Speech Communication Association (ISCA), 2020. s. 4352-4356

Bibtex

@inproceedings{9cd3f7a63cca49108ae28929d525649e,

title = "Do end-to-end speech recognition models care about context?",

abstract = "The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input. We find that the AED model is indeed more context sensitive, but that the gap can be closed by adding self-attention to the CTC model. Furthermore, the two models perform similarly when contextual information is constrained. Finally, in contrast to previous research, our results show that the CTC model is highly competitive on WSJ and LibriSpeech without the help of an external language model.",

keywords = "Attention-based encoder-decoder, Automatic speech recognition, Connectionist temporal classification, End-to-end speech recognition",

author = "Lasse Borgholt and Havtorn, {Jakob D.} and {\v Z}eljko Agic and Anders S{\o}gaard and Lars Maal{\o}e and Christian Igel",

year = "2020",

doi = "10.21437/Interspeech.2020-1750",

language = "English",

volume = "2020-October",

pages = "4352--4356",

booktitle = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

publisher = "International Speech Communication Association (ISCA)",

note = "21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference date: 25-10-2020 Through 29-10-2020",