Structure extended multinomial naive Bayes

被引:70
|
作者
Jiang, Liangxiao [1 ,2 ]
Wang, Shasha [1 ]
Li, Chaoqun [3 ]
Zhang, Lungan [1 ]
机构
[1] China Univ Geosci, Dept Comp Sci, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Hubei Key Lab Intelligent Geoinformat Proc, Wuhan 430074, Peoples R China
[3] China Univ Geosci, Dept Math, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Multinomial naive Bayes; Structure extension; TERM-WEIGHTING SCHEME; SOFTWARE TOOL; TEXT; CLASSIFIERS; ALGORITHMS; KEEL;
D O I
10.1016/j.ins.2015.09.037
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multinomial naive Bayes (MNB) assumes that all attributes (i.e., features) are independent of each other given the context of the class, and it ignores all dependencies among attributes. However, in many real-world applications, the attribute independence assumption required by MNB is often violated and thus harms its performance. To weaken this assumption, one of the most direct ways is to extend its structure to represent explicitly attribute dependencies by adding arcs between attributes. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network from high-dimensional text data is almost impossible. The main reason is that learning the optimal structure of a Bayesian network from high-dimensional text data is extremely time and space consuming. Thus, it would be desirable if a multinomial Bayesian network model can avoid structure learning and be able to represent attribute dependencies to some extent. In this paper, we propose a novel model called structure extended multinomial naive Bayes (SEMNB). SEMNB alleviates the attribute independence assumption by averaging all of the weighted one-dependence multinomial estimators. To learn SEMNB, we propose a simple but effective learning algorithm without structure searching. The experimental results on a large suite of benchmark text datasets show that SEMNB significantly outperforms MNB and is even markedly better than other three state-of-the-art improved algorithms including TOM, DWMNB, and Rw,cMNB. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:346 / 356
页数:11
相关论文
共 50 条
  • [1] Multinomial naive Bayes for text categorization revisited
    Kibriya, AM
    Frank, E
    Pfahringer, B
    Holmes, G
    [J]. AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 488 - 499
  • [2] Mixture of latent multinomial naive Bayes classifier
    Harzevili, Nima Shiri
    Alizadeh, Sasan H.
    [J]. APPLIED SOFT COMPUTING, 2018, 69 : 516 - 527
  • [3] MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier
    Bagui, Sikha
    Devulapalli, Keerthi
    John, Sharon
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2020, 16 (02) : 1 - 23
  • [4] Modifying Naive Bayes classifier for multinomial text classification
    [J]. 1600, Institute of Electrical and Electronics Engineers Inc., United States
  • [5] Multinomial Naive Bayes Classification Model for Sentiment Analysis
    Abbas, Muhammad
    Memon, Kamran Ali
    Jamali, Abdul Aleem
    Memon, Saleemullah
    Ahmed, Anees
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2019, 19 (03): : 62 - 67
  • [6] Modifying Naive Bayes Classifier for Multinomial Text Classification
    Sharma, Neha
    Singh, Manoj
    [J]. 2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,
  • [7] A Query Expansion Method Using Multinomial Naive Bayes
    Silva, Sergio
    Seara Vieira, Adrian
    Celard, Pedro
    Iglesias, Eva Lorenzo
    Borrajo, Lourdes
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (21):
  • [8] Multinomial Naive Bayes for real-time gender recognition
    Vergara, Diego
    Hernandez, Sergio
    Jorquera, Felipe
    [J]. 2016 XXI SYMPOSIUM ON SIGNAL PROCESSING, IMAGES AND ARTIFICIAL VISION (STSIVA), 2016,
  • [9] Evolving extended naive bayes classifiers
    Klawonn, Frank
    Angelov, Plamen
    [J]. ICDM 2006: Sixth IEEE International Conference on Data Mining, Workshops, 2006, : 643 - 647
  • [10] Indoor Localization Using Improved Multinomial Naive Bayes Technique
    Ul Haq, Muhammad Aziz
    Kamboh, Hammid Mehmood Allahdita
    Akram, Usman
    Sohail, Amer
    Iram, Hifsa
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT-AECIA 2016, 2018, 565 : 321 - 329