Vari-gram Language Model Based On Category

被引:0
|
作者
Yuan, Lichi [1 ]
机构
[1] Jiangxi Univ Finance & Econ Nanchang, Sch Informat Technol, Nanchang 330013, Peoples R China
关键词
Word clustering; statistical language model; Vari-gram language model;
D O I
10.4028/www.scientific.net/AMM.58-60.995
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Category-based statistic language model is an important method to solve the problem of sparse data. But there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and not large amount of computation. (2) class based method always lose some prediction ability to adapt the text of different domain. The authors try to solve above problems in this paper. This paper presents a novel definition of word similarity. Based on word similarity, this paper gives the definition of word set similarity. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. At the same time, this paper presents a new method to create the van-gram model.
引用
收藏
页码:995 / 1000
页数:6
相关论文
共 50 条
  • [21] Language puzzles - A prospective retrospective on the linguistic category model
    Semin, Guen R.
    [J]. JOURNAL OF LANGUAGE AND SOCIAL PSYCHOLOGY, 2008, 27 (02) : 197 - 209
  • [22] An N-gram based model for predicting of word-formation in Assamese language
    Bhuyan, M. P.
    Sarma, S. K.
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
  • [23] Managed N-gram Language Model Based on Hadoop Framework and a Hbase Tables
    Allam, Tahani Mahmoud
    Sallam, Alsayed Abdelhameed
    Abdullkader, Hatem M.
    [J]. 2014 9TH INTERNATIONAL CONFERENCE ON INFORMATICS AND SYSTEMS (INFOS), 2014,
  • [24] A New Estimate of the n-gram Language Model
    Aouragh, Si Lhoussain
    Yousfi, Abdellah
    Laaroussi, Saida
    Gueddah, Hicham
    Nejja, Mohammed
    [J]. AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 211 - 215
  • [25] W-n-gram: a hybrid language model
    Wang, XL
    Yeung, DS
    Liu, JNK
    Luk, R
    Wang, X
    [J]. IC-AI'2000: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 1-III, 2000, : 1265 - 1269
  • [26] Development of the N-gram Model for Azerbaijani Language
    Bannayeva, Aliya
    Aslanov, Mustafa
    [J]. 2020 IEEE 14TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT2020), 2020,
  • [27] A graphical language for quantum protocols based on the category of cobordisms
    DorDevic, Dusan
    Petric, Zoran
    Zekic, Mladen
    [J]. QUANTUM STUDIES-MATHEMATICS AND FOUNDATIONS, 2024, 11 (03) : 643 - 671
  • [28] English grammar intelligent error correction technology based on the n-gram language model
    Xiao, Fan
    Yin, Shehui
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2024, 33 (01)
  • [29] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [30] Dynamic Language Model Adaptation Using Keyword Category Classification
    Yamamoto, Hitoshi
    Hanazawa, Ken
    Miki, Kiyokazu
    Shinoda, Koichi
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2426 - +