Building Sense Representations in Danish by Combining Word Embeddings with Lexical Resources
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Dokumenter
- 2020.globalex-1.8
Forlagets udgivne version, 576 KB, PDF-dokument
Our aim is to identify suitable sense representations for NLP in Danish. We investigate sense inventories that correlate with human interpretations of word meaning and ambiguity as typically described in dictionaries and wordnets and that are well reflected distributionally
as expressed in word embeddings. To this end, we study a number of highly ambiguous Danish nouns and examine the effectiveness of
sense representations constructed by combining vectors from a distributional model with the information from a wordnet. We establish
representations based on centroids obtained from wordnet synsets and example sentences as well as representations established via
a clustering approach; these representations are tested in a word sense disambiguation task. We conclude that the more information
extracted from the wordnet entries (example sentence, definition, semantic relations) the more successful the sense representation vector.
as expressed in word embeddings. To this end, we study a number of highly ambiguous Danish nouns and examine the effectiveness of
sense representations constructed by combining vectors from a distributional model with the information from a wordnet. We establish
representations based on centroids obtained from wordnet synsets and example sentences as well as representations established via
a clustering approach; these representations are tested in a word sense disambiguation task. We conclude that the more information
extracted from the wordnet entries (example sentence, definition, semantic relations) the more successful the sense representation vector.
Originalsprog | Engelsk |
---|---|
Titel | Globalex Workshop on Linked Lexicography : LREC 2020 Workshop Language Resources and Evaluation Conference |
Antal sider | 7 |
Udgivelsessted | Marseille, France |
Forlag | European Language Resources Association |
Publikationsdato | 2020 |
Sider | 45-52 |
ISBN (Elektronisk) | 979-10-95546-46-7 |
Status | Udgivet - 2020 |
Links
- https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/GLOBALEX2020book.pdf
Forlagets udgivne version
Antal downloads er baseret på statistik fra Google Scholar og www.ku.dk
Ingen data tilgængelig
ID: 241359613