Narrowing the field: Adapting place-name editions to place-name databases and overcoming structural variation problems

Research output: Contribution to conferenceConference abstract for conferenceResearchpeer-review

The printed place-name series Danmarks Stednavne (Place-names of Denmark) has been published since 1922, and in 2013 volume 26 was released. Still only about 2/3 of the area of Denmark is covered by the series.

Since 2009 a parallel effort has been made to digitalise the series through scanning and human-assisted character recognition – and place-name data from the rest of the country, derived from cadastral databases and a database of medieval settlement names, has been added while doing it. The resulting database, currently holding about 200,000 entries, is published at www.danmarksstednavne.dk and obviously draws heavily on the printed edition.
But the century-long effort of publishing in printed form has spawned a series of challenges to a strict database integration; first of all variations in microstructure making the parsing into information categories (i.e. database fields) quite difficult.

As of now, no less than 45 different database fields have been found necessary to structure the information found in a single place-name entry – some fields mandatory, some nonmandatory. And using a relational database structure, some fields have multiple occurrences within one entry (i.e. multiple source forms for one entry a.s.f.). Having made the conscious decision to split up the information into so many categories (i.e. fields) – instead of employing a broad 'other information' field – sophisticated algorithms have been developed in order to identify information category from typographical characteristics and the sequence of information in the series. Adding to the challenge is the
macrostructural variation: The areas covered in printed form are covered with shifting principles of selecting the names to be published. Finding the right balance between letting the algorithms structure this complex digitalised information and supplementing with manual work is crucial to the successful construction of a database that allows sophisticated searches while still holding an acceptably low margin of errors.
Original languageEnglish
Publication date9 Oct 2013
Number of pages1
Publication statusPublished - 9 Oct 2013
EventTrends in Toponymy - Ruprecht-Karls-Universität, Heidelberg, Germany
Duration: 7 Oct 201310 Oct 2013

Conference

ConferenceTrends in Toponymy
LocationRuprecht-Karls-Universität
CountryGermany
CityHeidelberg
Period07/10/201310/10/2013

ID: 105729642