Chinese word classification based on statistics

被引:0
|
作者
Zhao, SW [1 ]
Xia, Y [1 ]
Ma, SP [1 ]
Wang, Y [1 ]
Su, Z [1 ]
机构
[1] Tsing Hua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China
关键词
Chinese words classification; mutual information; class-based language model;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Chinese words classification based on statistics plays an important role in Natural Language Processing, such as speech recognition, intelligent Chinese input method, and so on. We first do statistics and calculation work on the large-scale corpus text. and then use the average mutual information as the global cost function for clustering all Chinese words into a predefined number of classes with a hybrid top-down splitting. and bottom-up merging approach. The result of classification is encouraging and can be used in the class-based language model.
引用
收藏
页码:2753 / 2756
页数:4
相关论文
共 8 条
  • [1] Brown P. F., 1992, Computational Linguistics, V18, P467
  • [2] CHANG CH, 1995, STUDY CORPUS BASED C
  • [3] CHANG CH, 1994, P COLING 94 KYOT JAP
  • [4] GAO J, 1997, PROBABILISITIC WORD
  • [5] JELINEK F, 1990, CLASSIFYING WORDS IM
  • [6] MACKAY DJC, 1995, INFERENCE LEARINING
  • [7] MCMAHON J, 1994, THESIS Q U BELFAST
  • [8] MCMAHON J, 1995, COMPUTATIONAL LINGUI