Vari-gram Language Model Based On Category

被引:0
|
作者
Yuan, Lichi [1 ]
机构
[1] Jiangxi Univ Finance & Econ Nanchang, Sch Informat Technol, Nanchang 330013, Peoples R China
关键词
Word clustering; statistical language model; Vari-gram language model;
D O I
10.4028/www.scientific.net/AMM.58-60.995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the van-gram model.
引用
收藏
页码:995 / 1000
页数:6
相关论文
共 50 条
  • [1] Vari-gram language model based on word clustering
    Li-chi Yuan
    [J]. Journal of Central South University, 2012, 19 : 1057 - 1062
  • [2] Vari-gram language model based on word clustering
    Yuan Li-chi
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2012, 19 (04) : 1057 - 1062
  • [3] Vari-gram language model based on word clustering
    袁里驰
    [J]. Journal of Central South University, 2012, 19 (04) : 1057 - 1062
  • [4] Word clustering based on similarity and vari-gram language model
    Yuan, LC
    Zhong, YX
    [J]. ICCC2004: Proceedings of the 16th International Conference on Computer Communication Vol 1and 2, 2004, : 1222 - 1226
  • [5] A variable-length category-based n-gram language model
    Niesler, TR
    Woodland, PC
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 164 - 167
  • [6] Variable-length category n-gram language models
    Niesler, TR
    Woodland, PC
    [J]. COMPUTER SPEECH AND LANGUAGE, 1999, 13 (01): : 99 - 124
  • [7] Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition
    Fohr, Dominique
    Mella, Odile
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2231 - 2235
  • [8] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [9] Modelling word-pair relations in a category-based language model
    Niesler, TR
    Woodland, PC
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 795 - 798
  • [10] Automated DNA Assembly Based on Four-Gram Statistical Language Model
    FANG Gang
    LIU Wenbin
    ZHANG Shemin
    [J]. Chinese Journal of Electronics, 2018, 27 (06) : 1200 - 1205