A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model

被引:0
|
作者
Husna, Asmaul [1 ]
Mostofa, Maliha [1 ]
Khatun, Ayesha [1 ]
Islam, Jahidul [1 ]
Mahin, Md. [1 ]
机构
[1] Green Univ Bangladesh, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Bangla language processing; word cluster; corpus; higher orders n-gram; threshold values;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering of words is the method that is used to partition the sets of words into subsets of semantically similar words. Word clustering has crucial in many uses of natural language processing like PUS tagging, spell checker, grammar checker, word sense disambiguation and many more. In this paper we propose a model by using higher order N-grams language model that is helpful for clustering Bangla word efficiently, which is based on the similarity of meaning in language and contextual. N-gram rules used to propagate various types of probabilities for different form of sentences. For implementation we also propose a system that generates different words of cluster and tested by threshold values to justify given result. By experimenting with a large corpus of the word length of Bangla sentences, our proposed model shows the accuracy approximately 89% for higher order N-gram which is quite satisfactory.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Variable-order n-gram generation by word-class splitting and consecutive word grouping
    Masataki, H
    Sgisaka, Y
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 188 - 191
  • [32] Short Text Clustering using Numerical data based on N-gram
    Kumar, Rajiv
    Mathur, Robin Prakash
    [J]. 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 274 - 276
  • [34] Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition
    Fohr, Dominique
    Mella, Odile
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2231 - 2235
  • [35] Multiclass composite N-gram language model based on connection direction
    Yamamoto, Hirofumi
    Sagisaka, Yoshinori
    [J]. Systems and Computers in Japan, 2003, 34 (07) : 108 - 114
  • [36] Fast Neural Network Language Model Lookups at N-Gram Speeds
    Huang, Yinghui
    Sethy, Abhinav
    Ramabhadran, Bhuvana
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 274 - 278
  • [37] Topic-Dependent-Class-Based n-Gram Language Model
    Naptali, Welly
    Tsuchiya, Masatoshi
    Nakagawa, Seiichi
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
  • [38] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
    Hamarashid, Hozan K.
    Saeed, Soran A.
    Rashid, Tarik A.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 4547 - 4566
  • [39] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
    Hozan K. Hamarashid
    Soran A. Saeed
    Tarik A. Rashid
    [J]. Neural Computing and Applications, 2021, 33 : 4547 - 4566
  • [40] A Novel Interpolated N-gram Language Model Based on Class Hierarchy
    Lv, Zhenyu
    Liu, Wenju
    Yang, Zhanlei
    [J]. IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 473 - 477