Feature selection for text classification with Naive Bayes

被引:326
|
作者
Chen, Jingnian [1 ,2 ]
Huang, Houkuan [1 ]
Tian, Shengfeng [1 ]
Qu, Youli [1 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Shandong Univ Finance, Dept Informat & Comp Sci, Jinan 250014, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Feature selection; Text preprocessing; Naive Bayes; NEAREST-NEIGHBOR;
D O I
10.1016/j.eswa.2008.06.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:5432 / 5435
页数:4
相关论文
共 50 条
  • [1] Text Classification Based on Naive Bayes Algorithm with Feature Selection
    Chen, Zhenguo
    Shi, Guang
    Wang, Xiaoju
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260
  • [2] Feature subset selection using naive Bayes for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Sun, Tieli
    [J]. PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
  • [3] Divergence-Based Feature Selection for Naive Bayes Text Classification
    Wang, Huizhen
    Zhu, Jingbo
    Su, Keh-Yih
    [J]. IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 209 - +
  • [4] Discrimination-based feature selection for multinomial naive Bayes text classification
    Zhu, Jingbo
    Wang, Huizhen
    Zhang, Xijuan
    [J]. COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 149 - +
  • [5] A New Feature Selection Approach to Naive Bayes Text Classifiers
    Zhang, Lungan
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (02)
  • [6] DEEP FEATURE WEIGHTING IN NAIVE BAYES FOR CHINESE TEXT CLASSIFICATION
    Jiang, Qiaowei
    Wang, Wen
    Han, Xu
    Zhang, Shasha
    Wang, Xinyan
    Wang, Cong
    [J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 160 - 164
  • [7] Toward Optimal Feature Selection in Naive Bayes for Text Categorization
    Tang, Bo
    Kay, Steven
    He, Haibo
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (09) : 2508 - 2521
  • [8] Feature selection for multi-label naive Bayes classification
    Zhang, Min-Ling
    Pena, Jose M.
    Robles, Victor
    [J]. INFORMATION SCIENCES, 2009, 179 (19) : 3218 - 3229
  • [9] Deep feature weighting for naive Bayes and its application to text classification
    Jiang, Liangxiao
    Li, Chaoqun
    Wang, Shasha
    Zhang, Lungan
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 26 - 39
  • [10] Naive Feature Selection: Sparsity in Naive Bayes
    Askari, Armin
    d'Aspremont, Alex
    El Ghaoui, Laurent
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1813 - 1821