Text-based Language Identifier using Multinomial Naive Bayes Algorithm

被引:0
|
作者
Rawat, Sunita [1 ]
Werulkar, Lakshita [1 ]
Jaywant, Sagarika [1 ]
机构
[1] Shri Ramdeobaba Coll Engn & Management, Dept Comp Sci & Engn, Nagpur, India
来源
关键词
Language Identification; Natural Language Processing (NLP); Multinomial Na?ve Bayes (MNB); N-Gram algorithm; Term Frequency-Inverse Document Frequency (TF-IDF);
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Language Identification is among the crucial steps in any NLP based application. Text -based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Naive Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done on datasets of each language has produced satisfactorily accurate results after training and testing the model.
引用
收藏
页码:96 / 102
页数:7
相关论文
共 50 条
  • [41] An Improved Naive Bayes Text Classification Algorithm In Chinese Information Processing
    Yuan, Lingling
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 267 - 269
  • [42] Examining n-grams and Multinomial Naive Bayes Classifier for Identifying the Author of the Text "Epistle to the Hebrews"
    Satos, Panagiotis
    Stylios, Chrysostomos
    [J]. ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 1, 2022, : 447 - 458
  • [43] Parallel naive Bayes algorithm for large-scale Chinese text classification based on spark
    Liu Peng
    Zhao Hui-han
    Teng Jia-yu
    Yang Yan-yan
    Liu Ya-feng
    Zhu Zong-wei
    [J]. JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2019, 26 (01) : 1 - 12
  • [44] Feature Selection Based on Sampling and C4.5 Algorithm to Improve the Quality of Text Classification Using Naive Bayes
    Molano, Viviana
    Cobos, Carlos
    Mendoza, Martha
    Herrera-Viedma, Enrique
    Manic, Milos
    [J]. HUMAN-INSPIRED COMPUTING AND ITS APPLICATIONS, PT I, 2014, 8856 : 80 - 91
  • [45] Combining naive Bayes and n-gram language models for text classification
    Peng, FC
    Schuurmans, D
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
  • [46] Boosting Naive Bayes Text Categorization by Using Cloud Model
    Wan, Jian
    He, Tingting
    Chen, Jinguang
    Dong, Jinling
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL, AND SYSTEMS SCIENCES, AND ENGINEERING (CESSE 2011), 2011, : 165 - +
  • [47] A Scalable Text Classification Using Naive Bayes with Hadoop Framework
    Temesgen, Mulualem Mheretu
    Lemma, Dereje Teferi
    [J]. INFORMATION AND COMMUNICATION TECHNOLOGY FOR DEVELOPMENT FOR AFRICA (ICT4DA 2019), 2019, 1026 : 291 - 300
  • [48] Naive bayes text categorization using improved feature selection
    Lin, Kunhui
    Kang, Kai
    Huang, Yunping
    Zhou, Changle
    Wang, Beizhan
    [J]. Journal of Computational Information Systems, 2007, 3 (03): : 1159 - 1164
  • [49] Naive Bayes Classifier for depression detection using text data
    Samanvitha, S.
    Bindiya, A. R.
    Sudhanva, Shreya
    Mahanand, B. S.
    [J]. 2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, : 418 - 421
  • [50] Improving Naive Bayes text classirier using smoothing methods
    He, Feng
    Ding, Xiaoqing
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 703 - +