Text-based Language Identifier using Multinomial Naive Bayes Algorithm

被引：0

作者：

Rawat, Sunita ^{[1
]}

Werulkar, Lakshita ^{[1
]}

Jaywant, Sagarika ^{[1
]}

机构：

[1] Shri Ramdeobaba Coll Engn & Management, Dept Comp Sci & Engn, Nagpur, India

来源：

INTERNATIONAL JOURNAL OF NEXT-GENERATION COMPUTING | 2023年 / 14卷 / 01期

关键词：

Language Identification; Natural Language Processing (NLP); Multinomial Na?ve Bayes (MNB); N-Gram algorithm; Term Frequency-Inverse Document Frequency (TF-IDF);

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Language Identification is among the crucial steps in any NLP based application. Text -based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Naive Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done on datasets of each language has produced satisfactorily accurate results after training and testing the model.

引用

页码：96 / 102

页数：7

共 50 条

[41] An Improved Naive Bayes Text Classification Algorithm In Chinese Information Processing
Yuan, Lingling
[J]. THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 267 - 269
[42] Examining n-grams and Multinomial Naive Bayes Classifier for Identifying the Author of the Text "Epistle to the Hebrews"
Satos, Panagiotis
Stylios, Chrysostomos
[J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 447 - 458
[43] Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark
Liu Peng
Zhao Hui-han
Teng Jia-yu
Yang Yan-yan
Liu Ya-feng
Zhu Zong-wei
[J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2019, 26 (01) : 1 - 12
[44] Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naive Bayes
Molano, Viviana
Cobos, Carlos
Mendoza, Martha
Herrera-Viedma, Enrique
Manic, Milos
[J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 80 - 91
[45] Combining naive Bayes and n-gram language models for text classification
Peng, FC
Schuurmans, D
[J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
[46] Boosting Naive Bayes Text Categorization by Using Cloud Model
Wan, Jian
He, Tingting
Chen, Jinguang
Dong, Jinling
[J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL, AND SYSTEMS SCIENCES, AND ENGINEERING (CESSE 2011), 2011, : 165 - +
[47] A Scalable Text Classification Using Naive Bayes with Hadoop Framework
Temesgen, Mulualem Mheretu
Lemma, Dereje Teferi
[J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA 2019), 2019, 1026 : 291 - 300
[48] Naive bayes text categorization using improved feature selection
Lin, Kunhui
Kang, Kai
Huang, Yunping
Zhou, Changle
Wang, Beizhan
[J]. Journal of Computational Information Systems, 2007, 3 (03): : 1159 - 1164
[49] Naive Bayes Classifier for depression detection using text data
Samanvitha, S.
Bindiya, A. R.
Sudhanva, Shreya
Mahanand, B. S.
[J]. 2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, : 418 - 421
[50] Improving Naive Bayes text classirier using smoothing methods
He, Feng
Ding, Xiaoqing
[J]. ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 703 - +

← 1 2 3 4 5 →