loading...
Word Segmentation for the Sequences Emitted from a Word-Valued Source
Aizu-Wakamatsu City, Fukushima, Japan October 16-October 19
DOI Bookmark: http://doi.ieeecomputersociety.org/10.1109/CIT.2007.1707th IEEE International Conference on ...
 This Article 
 
PDF
HTML
 
 Share 
   
 Bibliographic References 
   
 Add to: 
 
Digg
Furl
Spurl
Blink
Simpy
Google
Del.icio.us
Y!MyWeb
 
 Search 
   
Takashi Ishida, Waseda University
Toshiyasu Matsushima, Waseda University
Shigeichi Hirasawa, Waseda University
Word segmentation is the most fundamental and impor- tant process for Japanese or Chinese language processing. Because there is no separation between words in these lan- guages, we firstly have to separate the sequence into words. On this problem, it is known that the approach by proba- bilistic language model is highly efficient, and this is shown practically. On the other hand, recently, a word-valued source has been proposed as a new class of source model for the source coding problem. This model can be supposed to reflect more of the probability structure of natural lan- guages. We may regard Japanese sentence or Chinese sen- tence as the sequence emitting from a non-prefix-free WVS. In this paper, as the first phase of applying WVS to natu- ral language processing, we formulate a word segmentation problem for the sequence from non-prefix-free WVS. Then, we examine the performance of word segmentation for the models by numerical computations.
Citation:
Takashi Ishida, Toshiyasu Matsushima, Shigeichi Hirasawa, "Word Segmentation for the Sequences Emitted from a Word-Valued Source," cit, pp.662-661, 7th IEEE International Conference on Computer and Information Technology (CIT 2007), 2007
Usage of this product signifies your acceptance of the Terms of Use.


Suggestions