Divergence-Based Feature Selection for Naive Bayes Text Classification

被引:0
|
作者
Wang, Huizhen [1 ]
Zhu, Jingbo [1 ]
Su, Keh-Yih [2 ]
机构
[1] Northeastern Univ, Nat Language Proc Lab, Shenyang, Liaoning, Peoples R China
[2] Behav Design Corp, Hsinchu, Taiwan
关键词
Divergence-based; feature selection; text classification; overall-divergence;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
A new divergence-based approach to feature selection for naive Bayes text classification is proposed in this paper. In this approach, the discrimination power of each feature is directly used for ranking various features through a criterion named overall-divergence, which is based on the divergence measures evaluated between various class density function pairs. Compared with other state-of-the-art algorithms (e.g. IG and CHI), the proposed approach shows more discrimination power for classifying confusing classes, and achieves better or comparable performance oil evaluation data sets.
引用
收藏
页码:209 / +
页数:3
相关论文
共 50 条
  • [1] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [2] Text Classification Based on Naive Bayes Algorithm with Feature Selection
    Chen, Zhenguo
    Shi, Guang
    Wang, Xiaoju
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260
  • [3] Feature subset selection using naive Bayes for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Sun, Tieli
    [J]. PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
  • [4] Discrimination-based feature selection for multinomial naive Bayes text classification
    Zhu, Jingbo
    Wang, Huizhen
    Zhang, Xijuan
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 149 - +
  • [5] Divergence-based feature selection for separate classes
    Zhang, Yishi
    Li, Shujuan
    Wang, Teng
    Zhang, Zigang
    [J]. NEUROCOMPUTING, 2013, 101 : 32 - 42
  • [6] A novel text classification algorithm based on Naive Bayes and KL-divergence
    Wang, BY
    Zhang, SM
    [J]. PDCAT 2005: SIXTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES, PROCEEDINGS, 2005, : 913 - 915
  • [7] Information gain and divergence-based feature selection for machine learning-based text categorization
    Lee, CK
    Lee, GG
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2006, 42 (01) : 155 - 165
  • [8] A New Feature Selection Approach to Naive Bayes Text Classifiers
    Zhang, Lungan
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (02)
  • [9] Naive bayes text categorization using improved feature selection
    Lin, Kunhui
    Kang, Kai
    Huang, Yunping
    Zhou, Changle
    Wang, Beizhan
    [J]. Journal of Computational Information Systems, 2007, 3 (03): : 1159 - 1164
  • [10] DEEP FEATURE WEIGHTING IN NAIVE BAYES FOR CHINESE TEXT CLASSIFICATION
    Jiang, Qiaowei
    Wang, Wen
    Han, Xu
    Zhang, Shasha
    Wang, Xinyan
    Wang, Cong
    [J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 160 - 164