Efficient Method for Feature Selection in Text Classification

被引:0
|
作者
Sun, Jian [1 ]
Zhang, Xiang [1 ]
Liao, Dan [1 ,2 ]
Chang, Victor [3 ]
机构
[1] Univ Elect Sci & Technol China, Minist Educ, Key Lab Opt Fiber Sensing & Commun, Chengdu, Sichuan, Peoples R China
[2] UESTC, Guangdong Inst Elect & Informat Engn, Dongguan, Peoples R China
[3] Xian Jiaotong Liverpool Univ, Suzhou, Peoples R China
关键词
Text classification; Feature selection; Chi-square test;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the process of dealing with the classification of text, because the text after the Chinese word segmentation, an article will have a large number of feature words, for this feature, the document vector dimension will reach tens of thousands or even hundreds of thousands of dimensions, although theoretically Speaking, a large number of feature words can better characterize a document, but a document contains a large number of features for the classification of the feature word, its value is quite low. So the need to screen out those who have the classification of the word, to reduce the operational dimension of the purpose. This paper studies the traditional feature selection algorithm, and according to the shortcomings of the chi-square test method, Based on the shortcomings of traditional chi-square test, this paper presents an improved method of chi-square test combined with frequency and interclass concentration. Experiments show that the method has a good effect on the traditional chi-square test method.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] A feature selection method based on synonym merging in text classification system
    Yao, Haipeng
    Liu, Chong
    Zhang, Peiying
    Wang, Luyao
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2017,
  • [32] A novel multivariate filter method for feature selection in text classification problems
    Labani, Mahdieh
    Moradi, Parham
    Ahmadizar, Fardin
    Jalili, Mahdi
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 70 : 25 - 37
  • [33] A new feature selection method for handling redundant information in text classification
    You-wei Wang
    Li-zhou Feng
    [J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19 : 221 - 234
  • [34] A new feature selection method for handling redundant information in text classification
    Wang, You-wei
    Feng, Li-zhou
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (02) : 221 - 234
  • [35] An improved method of feature selection based on concept attributes in text classification
    Liao, SS
    Jiang, MH
    [J]. ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1140 - 1149
  • [36] Study on the Method of Feature Selection Based on Hybrid Model for Text Classification
    Li, Runzhi
    Zhang, Yangsen
    [J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 2881 - 2886
  • [37] Optimizing text classification through efficient feature selection based on quality metric
    Lamirel, Jean-Charles
    Cuxac, Pascal
    Chivukula, Aneesh Sreevallabh
    Hajlaoui, Kafil
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2015, 45 (03) : 379 - 396
  • [38] Optimizing text classification through efficient feature selection based on quality metric
    Jean-Charles Lamirel
    Pascal Cuxac
    Aneesh Sreevallabh Chivukula
    Kafil Hajlaoui
    [J]. Journal of Intelligent Information Systems, 2015, 45 : 379 - 396
  • [39] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    [J]. IEEE ACCESS, 2021, 9 : 105439 - 105450
  • [40] An Efficient Feature Selection Method for Network Video Traffic Classification
    Dong, Yuning
    Yue, Quantao
    Feng, Mao
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1608 - 1612