Chinese word classification based on statistics

被引：0

作者：

Zhao, SW ^{[1
]}

Xia, Y ^{[1
]}

Ma, SP ^{[1
]}

Wang, Y ^{[1
]}

Su, Z ^{[1
]}

机构：

[1] Tsing Hua Univ, Dept Comp Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing 100084, Peoples R China

来源：

PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5 | 2000年

关键词：

Chinese words classification; mutual information; class-based language model;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Chinese words classification based on statistics plays an important role in Natural Language Processing, such as speech recognition, intelligent Chinese input method, and so on. We first do statistics and calculation work on the large-scale corpus text. and then use the average mutual information as the global cost function for clustering all Chinese words into a predefined number of classes with a hybrid top-down splitting. and bottom-up merging approach. The result of classification is encouraging and can be used in the class-based language model.

引用

页码：2753 / 2756

页数：4