Donato Malerba, Universita degli Studi di Bari, via Orabona, 4 - 70126 Bari - Italy
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information Extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domainrelevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a realworld dataset composed by publications selected to support biologists in the annotation of the HmtDB database.
Citation:
Margherita Berardi, Donato Malerba, Marcella Attimonelli, "Mining Information Extraction Models for HmtDB annotation," icdmw, pp.207-212, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), 2006