Word Clustering Algorithms Based on Word Similarity

被引：2

作者：

Yuan, Lichi ^{[1
]}

机构：

[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330013, Peoples R China

来源：

2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I | 2015年

关键词：

Word similarity; Word clustering; Statistical language model;

D O I：

10.1109/IHMSC.2015.36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.

引用

页码：21 / 24

页数：4

共 50 条

[1] A New Word Clustering Algorithm Based on Word Similarity
YUAN Lichi
ChineseJournalofElectronics, 2017, 26 (06) : 1221 - 1226
[2] A New Word Clustering Algorithm Based on Word Similarity
Yuan Lichi
CHINESE JOURNAL OF ELECTRONICS, 2017, 26 (06) : 1221 - 1226
[3] Word Clustering based on Word2vec and Semantic Similarity
Luo Jie
Wang Qinglin
Li Yuan
2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 517 - 521
[4] Algorithms for bigram and trigram word clustering
Martin, S
Liermann, J
Ney, H
SPEECH COMMUNICATION, 1998, 24 (01) : 19 - 37
[5] A CLUSTERING AND WORD SIMILARITY BASED APPROACH FOR IDENTIFYING PRODUCT FEATURE WORDS
Suryadi, Dedy
Kim, Harrison
DS87-6: PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 17) VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2017, : 71 - 80
[6] Word clustering based on similarity and vari-gram language model
Yuan, LC
Zhong, YX
ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
[7] Clustering words for statistical language models based on contextual word similarity
Farhat, A
Isabelle, JF
OShaughnessy, D
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 180 - 183
[8] Word sense disambiguation based on word sense clustering
Anaya-Sanchez, Henry
Pons-Porrata, Aurora
Berlanga-Llavori, Rafael
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA-SBIA 2006, PROCEEDINGS, 2006, 4140 : 472 - 481
[9] Similarity Word-Sequence Kernels for Sentence Clustering
Andres-Ferrer, Jesus
Sanchis-Trilles, German
Casacuberta, Francisco
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 610 - 619
[10] AUDIO WORD SIMILARITY FOR CLUSTERING WITH ZERO RESOURCES BASED ON ITERATIVE HMM CLASSIFICATION
Royer, Amelie
Gravier, Guillaume
Claveau, Vincent
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5340 - 5344

← 1 2 3 4 5 →