Some effective techniques for naive Bayes text classification

被引:299
|
作者
Kim, Sang-Bum
Han, Kyoung-Soo
Rim, Hae-Chang
Myaeng, Sung Hyon
机构
[1] Korea Univ, Coll Informat & Commun, Dept Comp Sci & Engn, Seoul 136701, South Korea
[2] Informat & Commun Univ, Taejon 305732, South Korea
基金
日本学术振兴会;
关键词
text classification; naive Bayes classifier; Poisson model; feature weighting;
D O I
10.1109/TKDE.2006.180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.
引用
收藏
页码:1457 / 1466
页数:10
相关论文
共 50 条
  • [41] A novel text classification algorithm based on Naive Bayes and KL-divergence
    Wang, BY
    Zhang, SM
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 913 - 915
  • [42] Discrimination-based feature selection for multinomial naive Bayes text classification
    Zhu, Jingbo
    Wang, Huizhen
    Zhang, Xijuan
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 149 - +
  • [43] Effective naive Bayes nearest neighbor based image classification on GPU
    Lei Zhu
    Hai Jin
    Ran Zheng
    Xiaowen Feng
    [J]. The Journal of Supercomputing, 2014, 68 : 820 - 848
  • [44] A Text Classification Approach using Parallel Naive Bayes in Big Data Context
    Amazal, Houda
    Ramdani, Mohammed
    Kissi, Mohamed
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
  • [45] Combining naive Bayes and n-gram language models for text classification
    Peng, FC
    Schuurmans, D
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
  • [46] Acceleration of Naive-Bayes Algorithm on Multicore Processor for Massive Text Classification
    Zhou, Lijun
    Yu, Zhiyi
    Lin, Jie
    Zhu, Shikai
    Shi, Weijing
    Zhou, Haijie
    Song, Kunpeng
    Zeng, Xiaoyang
    [J]. 2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 344 - 347
  • [47] Semantic Text Classification with Tensor Space Model-based Naive Bayes
    Kim, Han-joon
    Kim, Jiyun
    Kim, Jinseog
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4206 - 4210
  • [48] Personality Classification Based on Twitter Text Using Naive Bayes, KNN and SVM
    Pratama, Bayu Yudha
    Sarno, Riyanarto
    [J]. 2015 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2015, : 170 - 174
  • [49] Naive Bayes classification in R
    Zhang, Zhongheng
    [J]. ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (12) : 1 - 5
  • [50] WEIGHTED NAIVE BAYES FOR TEXT CLASSIFICATION USING POSITIVE TERM-CLASS DEPENDENCY
    Li, Yanjun
    luo, Congnan
    Chung, Soon M.
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2012, 21 (01)