Efficient Method for Feature Selection in Text Classification

被引:0
|
作者
Sun, Jian [1 ]
Zhang, Xiang [1 ]
Liao, Dan [1 ,2 ]
Chang, Victor [3 ]
机构
[1] Univ Elect Sci & Technol China, Minist Educ, Key Lab Opt Fiber Sensing & Commun, Chengdu, Sichuan, Peoples R China
[2] UESTC, Guangdong Inst Elect & Informat Engn, Dongguan, Peoples R China
[3] Xian Jiaotong Liverpool Univ, Suzhou, Peoples R China
关键词
Text classification; Feature selection; Chi-square test;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Optimizing text classification through efficient feature selection based on quality metric
    Lamirel, Jean-Charles
    Cuxac, Pascal
    Chivukula, Aneesh Sreevallabh
    Hajlaoui, Kafil
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 45 (03) : 379 - 396
  • [42] Optimizing text classification through efficient feature selection based on quality metric
    Jean-Charles Lamirel
    Pascal Cuxac
    Aneesh Sreevallabh Chivukula
    Kafil Hajlaoui
    [J]. Journal of Intelligent Information Systems, 2015, 45 : 379 - 396
  • [43] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    [J]. IEEE ACCESS, 2021, 9 (09): : 105439 - 105450
  • [44] An Efficient Feature Selection Method for Network Video Traffic Classification
    Dong, Yuning
    Yue, Quantao
    Feng, Mao
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1608 - 1612
  • [45] Comparison on Feature Selection Methods for Text Classification
    Liu, Wenkai
    Xiao, Jiongen
    Hong, Ming
    [J]. 2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
  • [46] A Bayesian feature selection paradigm for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Hao, Lizhu
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (02) : 283 - 302
  • [47] Composite Feature Extraction and Selection for Text Classification
    Wan, Chuan
    Wang, Yuling
    Liu, Yaoze
    Ji, Jinchao
    Feng, Guozhong
    [J]. IEEE ACCESS, 2019, 7 : 35208 - 35219
  • [48] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2006, 9 : 468 - 491
  • [49] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [50] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184