Integrating TEI/XML Text with Semantic Lexicographic Data

Traditional excerption-based historical dictionaries often provide a very detailed semantic analysis of a high proportion of words in the corpora they cover. The Dictionary of Old Norse Prose will have analyzed and defined around 7% of all words in a 11 million word corpus, for example. Linking the semantic analysis of excerpted citations to new digital texts of the works in the corpus offers the potential to give much more detailed context for the citations in the dictionary and at the same time contextual semantic information (definitions) for a high proportion of specific words in the corpus. The task is nontrivial as it involves linking separately-formed datasets consisting of tens of thousands of tokens. This paper describes a process by which a very high proportion of citations in the dictionary are linked to individual words in new digital editions, using sorting and lexical information. The result is that users of the dictionary can view the citations in their full textual context, and read
TitelDHN 2020 Digital Humanities in the Nordic Countries 2020 : Post-Proceedings of the 5th Conference Digital Humanities in the Nordic Countries (DHN 2020)
Antal sider10
Forlagceur workshop proceedings
Publikationsdato14 maj 2021
StatusUdgivet - 14 maj 2021


