Structure extended multinomial naive Bayes

被引:70
|
作者
Jiang, Liangxiao [1 ,2 ]
Wang, Shasha [1 ]
Li, Chaoqun [3 ]
Zhang, Lungan [1 ]
机构
[1] China Univ Geosci, Dept Comp Sci, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Hubei Key Lab Intelligent Geoinformat Proc, Wuhan 430074, Peoples R China
[3] China Univ Geosci, Dept Math, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Multinomial naive Bayes; Structure extension; TERM-WEIGHTING SCHEME; SOFTWARE TOOL; TEXT; CLASSIFIERS; ALGORITHMS; KEEL;
D O I
10.1016/j.ins.2015.09.037
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multinomial naive Bayes (MNB) assumes that all attributes (i.e., features) are independent of each other given the context of the class, and it ignores all dependencies among attributes. However, in many real-world applications, the attribute independence assumption required by MNB is often violated and thus harms its performance. To weaken this assumption, one of the most direct ways is to extend its structure to represent explicitly attribute dependencies by adding arcs between attributes. On the other hand, although a Bayesian network can represent arbitrary attribute dependencies, learning an optimal Bayesian network from high-dimensional text data is almost impossible. The main reason is that learning the optimal structure of a Bayesian network from high-dimensional text data is extremely time and space consuming. Thus, it would be desirable if a multinomial Bayesian network model can avoid structure learning and be able to represent attribute dependencies to some extent. In this paper, we propose a novel model called structure extended multinomial naive Bayes (SEMNB). SEMNB alleviates the attribute independence assumption by averaging all of the weighted one-dependence multinomial estimators. To learn SEMNB, we propose a simple but effective learning algorithm without structure searching. The experimental results on a large suite of benchmark text datasets show that SEMNB significantly outperforms MNB and is even markedly better than other three state-of-the-art improved algorithms including TOM, DWMNB, and Rw,cMNB. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:346 / 356
页数:11
相关论文
共 50 条
  • [41] Fuzzy Discretization on the Multinomial Naive Bayes Method for Modeling Multiclass Classification of Corn Plant Diseases and Pests
    Resti, Yulia
    Irsan, Chandra
    Neardiaty, Adinda
    Annabila, Choirunnisa
    Yani, Irsyadi
    [J]. MATHEMATICS, 2023, 11 (08)
  • [42] Identification of Bacteriophage Virion Proteins Using Multinomial Naive Bayes with g-Gap Feature Tree
    Pan, Yanyuan
    Gao, Hui
    Lin, Hao
    Liu, Zhen
    Tang, Lixia
    Li, Songtao
    [J]. INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2018, 19 (06)
  • [43] AN EMPIRICAL BAYES ESTIMATE OF MULTINOMIAL PROBABILITIES
    ALAM, K
    MITRA, A
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1986, 15 (10) : 3103 - 3127
  • [44] Using Character N-gram Features and Multinomial Naive Bayes for Sentiment Polarity Detection in Bengali Tweets
    Sarkar, Kamal
    [J]. PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,
  • [45] An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation
    Kumar, Sachin
    Sharma, Aditya
    Reddy, B. Kartheek
    Sachan, Shreyas
    Jain, Vaibhav
    Singh, Jagvinder
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2022, 13 (03) : 1341 - 1355
  • [46] Evolutional naive Bayes
    Jiang, LX
    Zhang, HJ
    Cai, ZH
    Su, J
    [J]. PROGRESS IN INTELLIGENCE COMPUTATION & APPLICATIONS, 2005, : 344 - 350
  • [47] Naive Bayes for regression
    Frank, E
    Trigg, L
    Holmes, G
    Witten, IH
    [J]. MACHINE LEARNING, 2000, 41 (01) : 5 - 25
  • [48] Naive Bayes clusterer
    Liu, Mujiexin
    Wang, Hongjun
    Li, Tian Rui
    Deng, Ping
    [J]. DATA SCIENCE AND KNOWLEDGE ENGINEERING FOR SENSING DECISION SUPPORT, 2018, 11 : 637 - 644
  • [49] Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint
    Zhao, Li
    Huang, Minlie
    Yao, Ziyu
    Su, Rongwei
    Jiang, Yingying
    Zhu, Xiaoyan
    [J]. THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2877 - 2883
  • [50] An intelligent model based on integrated inverse document frequency and multinomial Naive Bayes for current affairs news categorisation
    Sachin Kumar
    Aditya Sharma
    B Kartheek Reddy
    Shreyas Sachan
    Vaibhav Jain
    Jagvinder Singh
    [J]. International Journal of System Assurance Engineering and Management, 2022, 13 : 1341 - 1355