Word clustering based on similarity and vari-gram language model

被引：0

作者：

Yuan, LC ^{[1
]}

Zhong, YX ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Coll Informat Engn, Beijing 100876, Peoples R China

来源：

ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2 | 2004年

关键词：

word clustering; Statistical Language Model; vari-gram;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Class based statistic language model is an important method to solve the problem of sparse,data. But there are two bottlenecks about this model: (1) The problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) Class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the vari-gram model.

引用

页码：1222 / 1226

页数：5

共 50 条

[21] A CLUSTERING AND WORD SIMILARITY BASED APPROACH FOR IDENTIFYING PRODUCT FEATURE WORDS
Suryadi, Dedy
Kim, Harrison
DS87-6: PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON ENGINEERING DESIGN (ICED 17) VOL 6: DESIGN INFORMATION AND KNOWLEDGE, 2017, : 71 - 80
[22] AUDIO WORD SIMILARITY FOR CLUSTERING WITH ZERO RESOURCES BASED ON ITERATIVE HMM CLASSIFICATION
Royer, Amelie
Gravier, Guillaume
Claveau, Vincent
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5340 - 5344
[23] A MODEL FOR WORD CLUSTERING
THOM, JA
ZOBEL, J
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1992, 43 (09): : 616 - 627
[24] Language model based arabic word segmentation
Lee, YS
Papineni, K
Roukos, S
Emam, O
Hassan, H
41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 399 - 406
[25] Statistical language models of Lithuanian based on word clustering and morphological decomposition
Vaiciunas, A
Kaminskas, V
Raskinis, G
INFORMATICA, 2004, 15 (04) : 565 - 580
[26] N-gram Language Model for Chinese Function-word-centered Patterns
Song J.
Liu Y.
Qu Y.
Journal of Computing and Information Technology, 2023, 31 (01) : 39 - 55
[27] MiNgMatch-A Fast N-gram Model for Word Segmentation of the Ainu Language
Nowakowski, Karol
Ptaszynski, Michal
Masui, Fumito
INFORMATION, 2019, 10 (10)
[28] AN APPROXIMATION ALGORITHM FOR WORD-REPLACEMENT USING A BI-GRAM LANGUAGE MODEL
He, Jing
Liang, Hongyu
2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 27 - 30
[29] Linguistic Summarization using a Weighted N-gram Language Model based on the Similarity of Time-series Data
Aoki, Kasumi
Kobayashi, Ichiro
2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 595 - 601
[30] Similarity Word-Sequence Kernels for Sentence Clustering
Andres-Ferrer, Jesus
Sanchis-Trilles, German
Casacuberta, Francisco
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2010, 6218 : 610 - 619

← 1 2 3 4 5 →