Some effective techniques for naive Bayes text classification

被引:299
|
作者
Kim, Sang-Bum
Han, Kyoung-Soo
Rim, Hae-Chang
Myaeng, Sung Hyon
机构
[1] Korea Univ, Coll Informat & Commun, Dept Comp Sci & Engn, Seoul 136701, South Korea
[2] Informat & Commun Univ, Taejon 305732, South Korea
基金
日本学术振兴会;
关键词
text classification; naive Bayes classifier; Poisson model; feature weighting;
D O I
10.1109/TKDE.2006.180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.
引用
收藏
页码:1457 / 1466
页数:10
相关论文
共 50 条
  • [31] An Improved Naive Bayes Text Classification Algorithm In Chinese Information Processing
    Yuan, Lingling
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 267 - 269
  • [32] Deep feature weighting for naive Bayes and its application to text classification
    Jiang, Liangxiao
    Li, Chaoqun
    Wang, Shasha
    Zhang, Lungan
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 26 - 39
  • [33] A Method of Text Classification Combining Naive Bayes and the Similarity Computing Algorithms
    Hong, Yinghan
    Mai, Guizhen
    Zeng, Hui
    Guo, Cai
    [J]. WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2015 WORKSHOPS, 2015, 9461 : 3 - 14
  • [34] Divergence-Based Feature Selection for Naive Bayes Text Classification
    Wang, Huizhen
    Zhu, Jingbo
    Su, Keh-Yih
    [J]. IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 209 - +
  • [35] Combining fuzzy clustering with Naive Bayes augmented learning in text classification
    Liu, Lizhen
    Sun, Xiaojing
    Song, Hantao
    [J]. 2006 1ST INTERNATIONAL SYMPOSIUM ON PERVASIVE COMPUTING AND APPLICATIONS, PROCEEDINGS, 2006, : 168 - +
  • [36] Compensation strategy of unseen feature words in naive Bayes text classification
    School of Management, Harbin Institute of Technology, Harbin 150001, China
    不详
    [J]. Harbin Gongye Daxue Xuebao, 2008, 6 (956-960):
  • [37] Chinese News Text Multi Classification Based on Naive Bayes Algorithm
    Wang, Fei
    Deng, Xin
    Hou, Lunqing
    [J]. ISCSIC'18: PROCEEDINGS OF THE 2ND INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, 2018,
  • [38] Naive Bayes Text Classification with Positive Features Selected by Statistical Method
    Meena, M. Janaki
    Chandran, K. R.
    [J]. FIRST INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING 2009 (ICAC 2009), 2009, : 28 - +
  • [39] Constrained domain maximum likelihood estimation for naive Bayes text classification
    Jesús Andrés-Ferrer
    Alfons Juan
    [J]. Pattern Analysis and Applications, 2010, 13 : 189 - 196
  • [40] On word frequency information and negative evidence in Naive Bayes text classification
    Schneider, KM
    [J]. ADVANCES IN NATURAL LANGUAGE PROCESSING, 2004, 3230 : 474 - 485