loading...
Learning the lexicon from raw texts for open-vocabulary Korean word recognition
Edinburgh, Scotland August 03-August 06
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/ICDAR.2003.1227659Seventh International Conference on D ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Sungho Ryu, KAIST
In this paper, we propose a novel method of building a language model for open-vocabulary Korean word recognition. Due to the complex morphology of Korean, it is inappropriate to use lexicons based on the linguistic entities such as words and morphemes in open-vocabulary domains. Instead, we build the lexicon by collecting variable length character sequences from the raw texts using a dynamic Bayesian network model of the language.
In simulated word recognition experiments, the proposed language model could find correct words from lattices of character candidates in 94.3% of cases, increasing the word recognition rates by 20.9%.
Citation:
Sungho Ryu, Jin Hyung Kim, "Learning the lexicon from raw texts for open-vocabulary Korean word recognition," icdar, vol. 1, pp.202, Seventh International Conference on Document Analysis and Recognition (ICDAR'03) - Volume 1, 2003
Usage of this product signifies your acceptance of the Terms of Use.