Efficient Method for Feature Selection in Text Classification

被引:0
|
作者
Sun, Jian [1 ]
Zhang, Xiang [1 ]
Liao, Dan [1 ,2 ]
Chang, Victor [3 ]
机构
[1] Univ Elect Sci & Technol China, Minist Educ, Key Lab Opt Fiber Sensing & Commun, Chengdu, Sichuan, Peoples R China
[2] UESTC, Guangdong Inst Elect & Informat Engn, Dongguan, Peoples R China
[3] Xian Jiaotong Liverpool Univ, Suzhou, Peoples R China
关键词
Text classification; Feature selection; Chi-square test;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Distance Variance Score: An Efficient Feature Selection Method in Text Classification
    Wang, Heyong
    Hong, Ming
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [2] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [3] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [4] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [5] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524
  • [6] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    [J]. 2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [7] A Hybrid Feature Selection Method For Vietnamese Text Classification
    Nguyen Tri Hai
    Tuan Dinh Le
    Nguyen Hoang Nghia
    Vu Thanh Nguyen
    [J]. 2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 91 - 96
  • [8] A novel probabilistic feature selection method for text classification
    Uysal, Alper Kursat
    Gunal, Serkan
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 36 : 226 - 235
  • [9] A parallel feature selection method study for text classification
    Zhao Li
    Wei Lu
    Zhanquan Sun
    Weiwei Xing
    [J]. Neural Computing and Applications, 2017, 28 : 513 - 524
  • [10] An Efficient Feature Selection Method for Activity Classification
    Zhang, Shumei
    McCullagh, Paul
    Callaghan, Vic
    [J]. 2014 INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS (IE), 2014, : 16 - 22