Some effective techniques for naive Bayes text classification

被引:299
|
作者
Kim, Sang-Bum
Han, Kyoung-Soo
Rim, Hae-Chang
Myaeng, Sung Hyon
机构
[1] Korea Univ, Coll Informat & Commun, Dept Comp Sci & Engn, Seoul 136701, South Korea
[2] Informat & Commun Univ, Taejon 305732, South Korea
基金
日本学术振兴会;
关键词
text classification; naive Bayes classifier; Poisson model; feature weighting;
D O I
10.1109/TKDE.2006.180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.
引用
收藏
页码:1457 / 1466
页数:10
相关论文
共 50 条
  • [1] Techniques for improving the performance of naive Bayes for text classification
    Schneider, KM
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2005, 3406 : 682 - 693
  • [2] An Improvement to Naive Bayes for Text Classification
    Zhang, Wei
    Gao, Feng
    [J]. CEIS 2011, 2011, 15
  • [3] Adapting Hidden Naive Bayes for Text Classification
    Gan, Shengfeng
    Shao, Shiqi
    Chen, Long
    Yu, Liangjun
    Jiang, Liangxiao
    [J]. MATHEMATICS, 2021, 9 (19)
  • [4] Adapting naive Bayes tree for text classification
    Shasha Wang
    Liangxiao Jiang
    Chaoqun Li
    [J]. Knowledge and Information Systems, 2015, 44 : 77 - 89
  • [5] Bayesian Naive Bayes classifiers to text classification
    Xu, Shuo
    [J]. JOURNAL OF INFORMATION SCIENCE, 2018, 44 (01) : 48 - 59
  • [6] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [7] Adapting naive Bayes tree for text classification
    Wang, Shasha
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 77 - 89
  • [8] Naive Bayes for text classification with unbalanced classes
    Frank, Eibe
    Bouckaert, Remco R.
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2006, PROCEEDINGS, 2006, 4213 : 503 - 510
  • [9] A Technique for Improving the Performance of Naive Bayes Text Classification
    Jiang, Yuqian
    Lin, Huaizhong
    Wang, Xuesong
    Lu, Dongming
    [J]. WEB INFORMATION SYSTEMS AND MINING, PT II, 2011, 6988 : 196 - 203
  • [10] An improved FloatBoost algorithm for Naive Bayes text classification
    Liu, XM
    Yin, JW
    Dong, JX
    Ghafoor, MA
    [J]. ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 162 - 171