A Framework for Word Clustering of Bangla Sentences Using Higher Order N-gram Language Model

被引：0

作者：

Husna, Asmaul ^{[1
]}

Mostofa, Maliha ^{[1
]}

Khatun, Ayesha ^{[1
]}

Islam, Jahidul ^{[1
]}

Mahin, Md. ^{[1
]}

机构：

[1] Green Univ Bangladesh, Dept Comp Sci & Engn, Dhaka, Bangladesh

来源：

2018 INTERNATIONAL CONFERENCE ON INNOVATION IN ENGINEERING AND TECHNOLOGY (ICIET) | 2018年

关键词：

Bangla language processing; word cluster; corpus; higher orders n-gram; threshold values;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Clustering of words is the method that is used to partition the sets of words into subsets of semantically similar words. Word clustering has crucial in many uses of natural language processing like PUS tagging, spell checker, grammar checker, word sense disambiguation and many more. In this paper we propose a model by using higher order N-grams language model that is helpful for clustering Bangla word efficiently, which is based on the similarity of meaning in language and contextual. N-gram rules used to propagate various types of probabilities for different form of sentences. For implementation we also propose a system that generates different words of cluster and tested by threshold values to justify given result. By experimenting with a large corpus of the word length of Bangla sentences, our proposed model shows the accuracy approximately 89% for higher order N-gram which is quite satisfactory.

引用

页数：6

共 50 条

[31] Variable-order n-gram generation by word-class splitting and consecutive word grouping
Masataki, H
Sgisaka, Y
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 188 - 191
[32] Short Text Clustering using Numerical data based on N-gram
Kumar, Rajiv
Mathur, Robin Prakash
[J]. 2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 274 - 276
[33] Word segmentation algorithm for Chinese language based on N-gram models and machine learning
[J]. 2001, Science Press (23):
[34] Combination of Random Indexing based Language Model and N-gram Language Model for Speech Recognition
Fohr, Dominique
Mella, Odile
[J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2231 - 2235
[35] Multiclass composite N-gram language model based on connection direction
Yamamoto, Hirofumi
Sagisaka, Yoshinori
[J]. Systems and Computers in Japan, 2003, 34 (07) : 108 - 114
[36] Fast Neural Network Language Model Lookups at N-Gram Speeds
Huang, Yinghui
Sethy, Abhinav
Ramabhadran, Bhuvana
[J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 274 - 278
[37] Topic-Dependent-Class-Based n-Gram Language Model
Naptali, Welly
Tsuchiya, Masatoshi
Nakagawa, Seiichi
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (05): : 1513 - 1525
[38] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
Hamarashid, Hozan K.
Saeed, Soran A.
Rashid, Tarik A.
[J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (09): : 4547 - 4566
[39] Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji
Hozan K. Hamarashid
Soran A. Saeed
Tarik A. Rashid
[J]. Neural Computing and Applications, 2021, 33 : 4547 - 4566
[40] A Novel Interpolated N-gram Language Model Based on Class Hierarchy
Lv, Zhenyu
Liu, Wenju
Yang, Zhanlei
[J]. IEEE NLP-KE 2009: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2009, : 473 - 477

← 1 2 3 4 5 →