Linguistically debatable or just plain wrong?

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt

Barbara Plank
Dirk Hovy
Søgaard, Anders

In linguistic annotation projects, we typically develop annotation guidelines to maximize inter-annotator agreement and learnability. However, in this position paper we question whether we should actually limit the disagreements between annotators, rather than embrace them. We present an empirical analysis of part-of-speech annotated data sets that suggests that certain disagreements are systematic across domains and languages. This points to an underlying ambiguity rather than random errors. Moreover, a quantitative analysis of disagreements reveals that the majority of them are due to linguistically debatable cases, rather than to actual annotation errors. Specifically, we show that even in the absence of annotation guidelines, only 2% of annotator choices are linguistically unmotivated.

Originalsprog	Engelsk
Titel	Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Vol/bind	volume 2
Udgivelsessted	Baltimore, Maryland
Forlag	Association for Computational Linguistics
Publikationsdato	2014
Sider	507-511
Status	Udgivet - 2014

ID: 107673308