In the Chinese language, a word consisting of one or more characters is a basic syntax-meaningful unit, however, each character in the word also has a definite meaning in itself. In this paper, we compare the perplexities of four n-gram language models (character-based bigram, character-based trigram, word-based bigram and class-based bigram) and their influence on the performance of contextual post-processing of Chinese scripts in an offline handwritten Chinese character recognition system. We also demonstrate the influence of the candidate set size on the performance of contextual post-processing in detail, and indicate that the number of candidates should vary with each script.
Citation:
Yuan-Xiang Li, Chew Lim Tan, "Influence of Language Models and Candidate Set Size on Contextual Post-processing for Chinese Script Recognition," icpr, vol. 2, pp.537-540, 17th International Conference on Pattern Recognition (ICPR'04) - Volume 2, 2004