A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model

被引:0
|
作者
Husna, Asmaul [1 ]
Mostofa, Maliha [1 ]
Khatun, Ayesha [1 ]
Islam, Jahidul [1 ]
Mahin, Md. [1 ]
机构
[1] Green Univ Bangladesh, Dept Comp Sci & Engn, Dhaka, Bangladesh
关键词
Bangla language processing; word cluster; corpus; higher orders n-gram; threshold values;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Clustering of words is the method that is used to partition the sets of words into subsets of semantically similar words. Word clustering has crucial in many uses of natural language processing like PUS tagging, spell checker, grammar checker, word sense disambiguation and many more. In this paper we propose a model by using higher order N-grams language model that is helpful for clustering Bangla word efficiently, which is based on the similarity of meaning in language and contextual. N-gram rules used to propagate various types of probabilities for different form of sentences. For implementation we also propose a system that generates different words of cluster and tested by threshold values to justify given result. By experimenting with a large corpus of the word length of Bangla sentences, our proposed model shows the accuracy approximately 89% for higher order N-gram which is quite satisfactory.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Bangla Word Clustering Based on N-gram Language Model
    Ismail, Sabir
    Rahman, M. Shahidur
    [J]. 2014 1ST INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION & COMMUNICATION TECHNOLOGY (ICEEICT 2014), 2014,
  • [2] A Corpus Based Unsupervised Bangla Word Stemming Using N-Gram Language Model
    Urmi, Tapashee Tabassum
    Jammy, Jasmine Jahan
    Ismail, Sabir
    [J]. 2016 5TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS AND VISION (ICIEV), 2016, : 824 - 828
  • [3] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. Lect. Notes Comput. Sci, 1600, (557-565):
  • [4] A language independent n-gram model for word segmentation
    Kang, Seung-Shik
    Hwang, Kyu-Baek
    [J]. AI 2006: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4304 : 557 - +
  • [5] Similar N-gram Language Model
    Gillot, Christian
    Cerisara, Christophe
    Langlois, David
    Haton, Jean-Paul
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1824 - 1827
  • [6] An N-gram based model for predicting of word-formation in Assamese language
    Bhuyan, M. P.
    Sarma, S. K.
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (02): : 427 - 440
  • [7] N-gram Language Model for Chinese Function-word-centered Patterns
    Song, Jie
    Liu, Yixiao
    Qu, Yunhua
    [J]. Journal of Computing and Information Technology, 2023, 31 (01) : 39 - 55
  • [8] MiNgMatch-A Fast N-gram Model for Word Segmentation of the Ainu Language
    Nowakowski, Karol
    Ptaszynski, Michal
    Masui, Fumito
    [J]. INFORMATION, 2019, 10 (10)
  • [9] UNSUPERVISED LANGUAGE MODEL ADAPTATION USING N-GRAM WEIGHTING
    Haidar, Md. Akmal
    O'Shaughnessy, Douglas
    [J]. 2011 24TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2011, : 857 - 860
  • [10] Multi-class composite N-gram language model for spoken language processing using multiple word clusters
    Yamamoto, H
    Isogai, S
    Sagisaka, Y
    [J]. 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2001, : 531 - 538