Naive Bayes for text classification with unbalanced classes

被引:0
|
作者
Frank, Eibe [1 ]
Bouckaert, Remco R.
机构
[1] Univ Waikato, Dept Comp Sci, Hamilton, New Zealand
[2] Xtal Mt Informat Technol, Auckland, New Zealand
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that predictive performance can be improved further by appropriate data transformations [1,2]. In this paper we present another transformation that is designed to combat a potential problem with the application of MNB to unbalanced datasets. We propose an appropriate correction by adjusting attribute priors. This correction can be implemented as another data normalization step, and we show that it can significantly improve the area under the ROC curve. We also show that the modified version of MNB is very closely related to the simple centroid-based classifier and compare the two methods empirically.
引用
收藏
页码:503 / 510
页数:8
相关论文
共 50 条
  • [41] Constrained domain maximum likelihood estimation for naive Bayes text classification
    Jesús Andrés-Ferrer
    Alfons Juan
    [J]. Pattern Analysis and Applications, 2010, 13 : 189 - 196
  • [42] On word frequency information and negative evidence in Naive Bayes text classification
    Schneider, KM
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 474 - 485
  • [43] A novel text classification algorithm based on Naive Bayes and KL-divergence
    Wang, BY
    Zhang, SM
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 913 - 915
  • [44] The naive Bayes text classification algorithm based on rough set in the cloud platform
    Dai, Yugang
    Sun, Haosheng
    [J]. Journal of Chemical and Pharmaceutical Research, 2014, 6 (07) : 1636 - 1643
  • [45] Discrimination-based feature selection for multinomial naive Bayes text classification
    Zhu, Jingbo
    Wang, Huizhen
    Zhang, Xijuan
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 149 - +
  • [46] A Text Classification Approach using Parallel Naive Bayes in Big Data Context
    Amazal, Houda
    Ramdani, Mohammed
    Kissi, Mohamed
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
  • [47] Combining naive Bayes and n-gram language models for text classification
    Peng, FC
    Schuurmans, D
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
  • [48] Acceleration of Naive-Bayes Algorithm on Multicore Processor for Massive Text Classification
    Zhou, Lijun
    Yu, Zhiyi
    Lin, Jie
    Zhu, Shikai
    Shi, Weijing
    Zhou, Haijie
    Song, Kunpeng
    Zeng, Xiaoyang
    [J]. 2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 344 - 347
  • [49] Semantic Text Classification with Tensor Space Model-based Naive Bayes
    Kim, Han-joon
    Kim, Jiyun
    Kim, Jinseog
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4206 - 4210
  • [50] Personality Classification Based on Twitter Text Using Naive Bayes, KNN and SVM
    Pratama, Bayu Yudha
    Sarno, Riyanarto
    [J]. 2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2015, : 170 - 174