Text-based Language Identifier using Multinomial Naive Bayes Algorithm

被引:0
|
作者
Rawat, Sunita [1 ]
Werulkar, Lakshita [1 ]
Jaywant, Sagarika [1 ]
机构
[1] Shri Ramdeobaba Coll Engn & Management, Dept Comp Sci & Engn, Nagpur, India
来源
关键词
Language Identification; Natural Language Processing (NLP); Multinomial Na?ve Bayes (MNB); N-Gram algorithm; Term Frequency-Inverse Document Frequency (TF-IDF);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Language Identification is among the crucial steps in any NLP based application. Text -based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Naive Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done on datasets of each language has produced satisfactorily accurate results after training and testing the model.
引用
收藏
页码:96 / 102
页数:7
相关论文
共 50 条
  • [1] Text-Based Gender Classification of Twitter Data using Naive Bayes and SVM Algorithm
    Angeles, Angelic
    Quintos, Maria Nikki
    Octaviano, Manolito, Jr.
    Raga, Rodolofo, Jr.
    [J]. 2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 522 - 526
  • [2] Using Naive Bayes Method to Classify Text-based Email
    Kang, LanLan
    Chen, Ruey-Shun
    Chen, Yeh-Cheng
    Cao, WenLiang
    [J]. 2018 9TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP 2018), 2018, : 94 - 98
  • [3] Multinomial naive Bayes for text categorization revisited
    Kibriya, AM
    Frank, E
    Pfahringer, B
    Holmes, G
    [J]. AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 488 - 499
  • [4] Classify Text-based Email Using Naive Bayes Method With Small Sample
    Zhu, Yanjun
    Zhu, Ting
    Li, Jianxin
    Cao, Wenliang
    Yong, Peng
    Jiang, Fei
    Liu, Jie
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2023, 39 (04) : 855 - 868
  • [5] Personality Classification based on Facebook status text using Multinomial Naive Bayes method
    Artissa, Y. B. N. D.
    Asror, I
    Faraby, S. A.
    [J]. 2ND INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE, 2019, 1192
  • [6] Modifying Naive Bayes Classifier for Multinomial Text Classification
    Sharma, Neha
    Singh, Manoj
    [J]. 2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,
  • [7] Discrimination-based feature selection for multinomial naive Bayes text classification
    Zhu, Jingbo
    Wang, Huizhen
    Zhang, Xijuan
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 149 - +
  • [8] Word Embedding based Multinomial Naive Bayes Algorithm for Spam Filtering
    Kadam, Sumedh
    Gala, Aayush
    Gehlot, Pritesh
    Kurup, Aditya
    Ghag, Kranti
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [9] Multinomial Naive Bayes using similarity based conditional probability
    Santhi, B.
    Brindha, G. R.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (02) : 1431 - 1441
  • [10] Text Classification Based on Naive Bayes Algorithm with Feature Selection
    Chen, Zhenguo
    Shi, Guang
    Wang, Xiaoju
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260