A Guide to Dictionary-Based Text Mining

Publikation: Bidrag til bog/antologi/rapport › Bidrag til bog/antologi › Forskning › fagfællebedømt

Helen V. Cook
Jensen, Lars Juhl

PubMed contains more than 27 million documents, and this number is growing at an estimated 4% per year. Even within specialized topics, it is no longer possible for a researcher to read any field in its entirety, and thus nobody has a complete picture of the scientific knowledge in any given field at any time. Text mining provides a means to automatically read this corpus and to extract the relations found therein as structured information. Having data in a structured format is a huge boon for computational efforts to access, cross reference, and mine the data stored therein. This is increasingly useful as biological research is becoming more focused on systems and multi-omics integration. This chapter provides an overview of the steps that are required for text mining: tokenization, named entity recognition, normalization, event extraction, and benchmarking. It discusses a variety of approaches to these tasks and then goes into detail on how to prepare data for use specifically with the JensenLab tagger. This software uses a dictionary-based approach and provides the text mining evidence for STRING and several other databases.

Originalsprog	Engelsk
Titel	Bioinformatics and Drug Discovery
Redaktører	Richard S. Larson, Tudor I. Oprea
Antal sider	17
Vol/bind	1939
Forlag	Humana Press
Publikationsdato	2019
Udgave	3
Sider	73-89
ISBN (Trykt)	978-1-4939-9088-7
ISBN (Elektronisk)	978-1-4939-9089-4
DOI	https://doi.org/10.1007/978-1-4939-9089-4_5
Status	Udgivet - 2019

Navn	Methods in Molecular Biology
ISSN	1064-3745

ID: 223876548