The purpose of this study is to develop a method of constructing a probabilistic hierarchical structure based on a statistical analysis of a Japanese corpus using a combination of Kameya and Sato?s statistical language analysis[7] and Rose?s model[10]. First, the co-occurrence frequencies of adjectives and nouns are calculated from a Japanese corpus based on modification relations. Second, latent classes are extracted from a statistical language analysis of the co-occurrence data. Third, the centroid vectors of the latent classes are calculated from the analysis results and a probabilistic hierarchical structure of the latent classes is constructed by utilizing Rose?s model. Finally, the conditional probabilities of the categories given the latent classes are computed as the association probabilities of the concepts to the categories and the conditional probabilities of the categories given the concepts are computed as the association probabilities of the concepts to the categories.
Citation:
Asuka Terai, Bin Liu, Masanori Nakagawa, "A Method for the Construction of a Probabilistic Hierarchical Structure Based on a Statistical Analysis of a Large-scale Corpus," icsc, pp.129-136, International Conference on Semantic Computing (ICSC 2007), 2007