Integrating TEI/XML Text with Semantic Lexicographic Data
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Integrating TEI/XML Text with Semantic Lexicographic Data. / Wills, Tarrin; Jóhannsson, Ellert Þór; Battista, Simonetta.
DHN 2020 Digital Humanities in the Nordic Countries 2020: Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020). Bind 2865 Riga : ceur workshop proceedings, 2021. s. 16-25.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Integrating TEI/XML Text with Semantic Lexicographic Data
AU - Wills, Tarrin
AU - Jóhannsson, Ellert Þór
AU - Battista, Simonetta
PY - 2021/5/14
Y1 - 2021/5/14
N2 - Traditional excerption-based historical dictionaries often provide a very detailed semantic analysis of a high proportion of words in the corpora they cover. The Dictionary of Old Norse Prose will have analyzed and defined around 7% of all words in a 11 million word corpus, for example. Linking the semantic analysis of excerpted citations to new digital texts of the works in the corpus offers the potential to give much more detailed context for the citations in the dictionary and at the same time contextual semantic information (definitions) for a high proportion of specific words in the corpus. The task is nontrivial as it involves linking separately-formed datasets consisting of tens of thousands of tokens. This paper describes a process by which a very high proportion of citations in the dictionary are linked to individual words in new digital editions, using sorting and lexical information. The result is that users of the dictionary can view the citations in their full textual context, and read
AB - Traditional excerption-based historical dictionaries often provide a very detailed semantic analysis of a high proportion of words in the corpora they cover. The Dictionary of Old Norse Prose will have analyzed and defined around 7% of all words in a 11 million word corpus, for example. Linking the semantic analysis of excerpted citations to new digital texts of the works in the corpus offers the potential to give much more detailed context for the citations in the dictionary and at the same time contextual semantic information (definitions) for a high proportion of specific words in the corpus. The task is nontrivial as it involves linking separately-formed datasets consisting of tens of thousands of tokens. This paper describes a process by which a very high proportion of citations in the dictionary are linked to individual words in new digital editions, using sorting and lexical information. The result is that users of the dictionary can view the citations in their full textual context, and read
M3 - Article in proceedings
VL - 2865
SP - 16
EP - 25
BT - DHN 2020 Digital Humanities in the Nordic Countries 2020
PB - ceur workshop proceedings
CY - Riga
ER -
ID: 262846022