Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Standard

Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus. / Jongejan, Bart; Hansen, Dorte Haltrup; Navarretta, Costanza.

CLARIN Annual Conference 2021 Proceedings. CLARIN ERIC, 2021. s. 70-73.

Publikation: Bidrag til bog/antologi/rapportKonferencebidrag i proceedingsForskningfagfællebedømt

Harvard

Jongejan, B, Hansen, DH & Navarretta, C 2021, Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus. i CLARIN Annual Conference 2021 Proceedings. CLARIN ERIC, s. 70-73. <https://office.clarin.eu/v/CE-2021-1923-CLARIN2021_ConferenceProceedings.pdf>

APA

Jongejan, B., Hansen, D. H., & Navarretta, C. (2021). Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus. I CLARIN Annual Conference 2021 Proceedings (s. 70-73). CLARIN ERIC. https://office.clarin.eu/v/CE-2021-1923-CLARIN2021_ConferenceProceedings.pdf

Vancouver

Jongejan B, Hansen DH, Navarretta C. Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus. I CLARIN Annual Conference 2021 Proceedings. CLARIN ERIC. 2021. s. 70-73

Author

Jongejan, Bart ; Hansen, Dorte Haltrup ; Navarretta, Costanza. / Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus. CLARIN Annual Conference 2021 Proceedings. CLARIN ERIC, 2021. s. 70-73

Bibtex

@inproceedings{8728c4bd76d84fab9aca128d40e50d81,
title = "Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus",
abstract = "In this paper we describe the Danish CLARIN resources, corpora, tools and workflow, which we used and enhanced in order to build the Danish ParlaMint corpus, as part of the CLARIN founded ParlaMint project. More specifically, the article accounts for the manual and automatic processes involved in the preparation of the Danish Parliamentary speeches with focus on the CLARIN-DK tools and Text Tonsorium workflow management. The tools annotated the speeches with metadata and linguistic information in compliance with the common ParlaMint TEI P5 format. As a spin-off of the project, the CLARIN-DK sen-tence tokenizer and the CST Named Entity Recognizer were improved. These tools, to-gether with the CST-lemmatiser, Danish UD-Pipe software and several data transformation utilities, produced all the linguistic annotations in the correct format. We conclude the pa-per with a report of a pilot evaluation of the quality of some of the linguistic annotations in the Danish ParlaMint corpus.",
author = "Bart Jongejan and Hansen, {Dorte Haltrup} and Costanza Navarretta",
year = "2021",
language = "English",
pages = "70--73",
booktitle = "CLARIN Annual Conference 2021 Proceedings",
publisher = "CLARIN ERIC",

}

RIS

TY - GEN

T1 - Enhancing CLARIN-DK Resources While Building the Danish ParlaMint Corpus

AU - Jongejan, Bart

AU - Hansen, Dorte Haltrup

AU - Navarretta, Costanza

PY - 2021

Y1 - 2021

N2 - In this paper we describe the Danish CLARIN resources, corpora, tools and workflow, which we used and enhanced in order to build the Danish ParlaMint corpus, as part of the CLARIN founded ParlaMint project. More specifically, the article accounts for the manual and automatic processes involved in the preparation of the Danish Parliamentary speeches with focus on the CLARIN-DK tools and Text Tonsorium workflow management. The tools annotated the speeches with metadata and linguistic information in compliance with the common ParlaMint TEI P5 format. As a spin-off of the project, the CLARIN-DK sen-tence tokenizer and the CST Named Entity Recognizer were improved. These tools, to-gether with the CST-lemmatiser, Danish UD-Pipe software and several data transformation utilities, produced all the linguistic annotations in the correct format. We conclude the pa-per with a report of a pilot evaluation of the quality of some of the linguistic annotations in the Danish ParlaMint corpus.

AB - In this paper we describe the Danish CLARIN resources, corpora, tools and workflow, which we used and enhanced in order to build the Danish ParlaMint corpus, as part of the CLARIN founded ParlaMint project. More specifically, the article accounts for the manual and automatic processes involved in the preparation of the Danish Parliamentary speeches with focus on the CLARIN-DK tools and Text Tonsorium workflow management. The tools annotated the speeches with metadata and linguistic information in compliance with the common ParlaMint TEI P5 format. As a spin-off of the project, the CLARIN-DK sen-tence tokenizer and the CST Named Entity Recognizer were improved. These tools, to-gether with the CST-lemmatiser, Danish UD-Pipe software and several data transformation utilities, produced all the linguistic annotations in the correct format. We conclude the pa-per with a report of a pilot evaluation of the quality of some of the linguistic annotations in the Danish ParlaMint corpus.

M3 - Article in proceedings

SP - 70

EP - 73

BT - CLARIN Annual Conference 2021 Proceedings

PB - CLARIN ERIC

ER -

ID: 279626708