Vari-gram Language Model Based On Category

被引:0
|
作者
Yuan, Lichi [1 ]
机构
[1] Jiangxi Univ Finance & Econ Nanchang, Sch Informat Technol, Nanchang 330013, Peoples R China
关键词
Word clustering; statistical language model; Vari-gram language model;
D O I
10.4028/www.scientific.net/AMM.58-60.995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the van-gram model.
引用
收藏
页码:995 / 1000
页数:6
相关论文
共 50 条
  • [21] W-n-gram: a hybrid language model
    Wang, XL
    Yeung, DS
    Liu, JNK
    Luk, R
    Wang, X
    [J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1265 - 1269
  • [22] Development of the N-gram Model for Azerbaijani Language
    Bannayeva, Aliya
    Aslanov, Mustafa
    [J]. 2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [23] A graphical language for quantum protocols based on the category of cobordisms
    DorDevic, Dusan
    Petric, Zoran
    Zekic, Mladen
    [J]. QUANTUM STUDIES-MATHEMATICS AND FOUNDATIONS, 2024, 11 (03) : 643 - 671
  • [24] English grammar intelligent error correction technology based on the n-gram language model
    Xiao, Fan
    Yin, Shehui
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2024, 33 (01)
  • [25] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [26] Dynamic Language Model Adaptation Using Keyword Category Classification
    Yamamoto, Hitoshi
    Hanazawa, Ken
    Miki, Kiyokazu
    Shinoda, Koichi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2426 - +
  • [27] IMPROVEMENTS TO N-GRAM LANGUAGE MODEL USING TEXT GENERATED FROM NEURAL LANGUAGE MODEL
    Suzuki, Masayuki
    Itoh, Nobuyasu
    Nagano, Tohru
    Kurata, Gakuto
    Thomas, Samuel
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7245 - 7249
  • [28] A variant of n-gram based language classification
    Tomovic, Andrija
    Janicic, Predrag
    [J]. AI(ASTERISK)IA 2007: ARTIFICIAL INTELLIGENCE AND HUMAN-ORIENTED COMPUTING, 2007, 4733 : 410 - +
  • [29] Combination of word-based and category-based language models
    Niesler, TR
    Woodland, PC
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 220 - 223
  • [30] A Category-Based Model for ABAC
    Fernandez, Maribel
    Thuraisingham, Bhavani
    [J]. PROCEEDINGS OF THE THIRD ACM WORKSHOP ON ATTRIBUTE-BASED ACCESS CONTROL (ABAC'18), 2018, : 32 - 34