loading...
The DIAsDEM Framework for Converting Domain-Specific Texts into XML Documents with Data Mining Techniques
San Jose, California November 29-December 02
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDM.2001.989515First IEEE International Conference o ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Modern organizations are accumulating huge volumes of textual documents. To turn archives into valuable know- ledge sources, textual content must become explicit and queryable. Semantic tagging with markup languages such as XML satisfies both requirements. We thus introduce the DIAsDEM* framework for extra ting semantics from structural text units (e.g., sentences), assigning XML tags to them and deriving a flat XML DTD for the archive. DIAsDEM focuses on archives characterized by a peculiar terminology and by an implicit structure such as court filings and company reports. In the knowledge discovery phase, text units are iteratively clustered by similarity of their content. Each iteration outputs clusters satisfying a set of quality criteria. Text units contained in these clusters are tagged with semi- automatically determined luster labels and XML tags respectively. Additionally, extracted named entities (e.g.,per- sons) serve as attributes of XML tags. We apply the frame- work in a case study on the German Commercial Register.
Citation:
Henner Graubitz, Myra Spiliopoulou, Karsten Winkler, "The DIAsDEM Framework for Converting Domain-Specific Texts into XML Documents with Data Mining Techniques," icdm, pp.171, First IEEE International Conference on Data Mining (ICDM'01), 2001
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions