An Improved Information Gain Feature Selection Algorithm for SVM Text Classifier

被引:12
|
作者
Xu, Jiamin [1 ]
Jiang, Hong [1 ]
机构
[1] East China Normal Univ, Dept Comp Ctr, Shanghai, Peoples R China
关键词
information gain algorithm; feature selection; between-class concentration distribution factor; within-class word frequency dispersion distribution factor;
D O I
10.1109/CyberC.2015.53
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection algorithm has a great influence on the accuracy of text categorization. The traditional information gain (IG) feature selection algorithm usually selects the features that rarely appear in the specified categories, but frequently appear in other categories. To overcome this drawback, on the basis of in-depth analysis of the related algorithms, an improved IG feature selection method is proposed. At first, the features are selected by the categories of data set, and the features from different categories are merged by an optimized method. Then, the weight of IG is calculated by using the probability of the appearance of these characteristics. At last, between-class concentration distribution factor and within-class word frequency dispersion distribution factor are adopted. SVM classifier is used to verify the algorithm. It is proved that our improved method has better performance than the original IG and other two improved methods.
引用
收藏
页码:273 / 276
页数:4
相关论文
共 50 条
  • [1] Feature Selection Methods for an Improved SVM Classifier
    Morariu, Daniel
    Vintan, Lucian N.
    Tresp, Volker
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 14, 2006, 14 : 83 - +
  • [2] Improved Information Gain-based Feature Selection for Text Categorization
    Gao, Zhe
    Xu, Yajing
    Meng, Fanyu
    Qi, Feng
    Lin, Zhiqing
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,
  • [3] Research on Text Feature Selection Algorithm Based on Information Gain and Feature Relation Tree
    Zhang, Hong
    Ren, Yong-gong
    Yang, Xue
    [J]. 2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 446 - 449
  • [4] Feature selection algorithm for text classification based on improved mutual information
    丛帅
    张积宾
    徐志明
    王宇颖
    [J]. Journal of Harbin Institute of Technology(New series), 2011, (03) : 144 - 148
  • [5] Feature selection algorithm for text classification based on improved mutual information
    丛帅
    张积宾
    徐志明
    王宇颖
    [J]. Journal of Harbin Institute of Technology, 2011, 18 (03) : 144 - 148
  • [6] Improved feature selection algorithm based on SVM and correlation
    Xie, Zong-Xia
    Hu, Qing-Hua
    Yu, Da-Ren
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1373 - 1380
  • [7] An ensemble svm classifier with feature selection
    Hu, Han
    En-en, Ren
    [J]. 2007 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY, PROCEEDINGS, 2007, : 6 - 8
  • [8] An Improved Feature Selection Method Based on Information Gain
    Li, Yanling
    Sun, Wenxia
    [J]. INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING BIOMEDICAL ENGINEERING, AND INFORMATICS (SPBEI 2013), 2014, : 530 - 535
  • [9] Improved Mutual Information Method For Text Feature Selection
    Ding Xiaoming
    Tang Yan
    [J]. PROCEEDINGS OF THE 2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2013), 2013, : 163 - 166
  • [10] A Text Feature Selection Algorithm Based on Improved TFIDF
    Chengcheng Yang
    Xingshi He
    [J]. PROCEEDINGS OF THE 2008 CHINESE CONFERENCE ON PATTERN RECOGNITION (CCPR 2008), 2008, : 416 - 419