Importance Weighted Feature Selection Strategy for Text Classification

被引:0
|
作者
Li, Baoli [1 ]
机构
[1] Henan Univ Technol, Dept Comp Sci, Zhengzhou, Peoples R China
关键词
Importance Weighted Feature Selection; Feature Selection; Information Gain; Chi-square; Text Classification; CATEGORIZATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively, thus calculated feature selection metrics are very likely to introduce much noise. We, therefore, in this study, extend the work of Li et al. [1] on document frequency metric, propose a general importance weighted feature selection strategy for text classification, in which the importance value of a feature in a document is derived from its relative frequency in that document. Extensive experiments with two state-of-the-art feature selection metrics (Chi-square and information gain) on three text classification datasets demonstrate the effectiveness of the proposed strategy.
引用
收藏
页码:344 / 347
页数:4
相关论文
共 50 条
  • [1] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [2] Weighted Document Frequency for Feature Selection in Text Classification
    Li, Baoli
    Yan, Qiuling
    Xu, Zhenqiang
    Wang, Guicai
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 132 - 135
  • [3] Dynamic Feature Selection Strategy in Incremental Chinese Text Classification
    Yang, Dan
    Fan, Xinghua
    [J]. 2012 2ND INTERNATIONAL CONFERENCE ON APPLIED ROBOTICS FOR THE POWER INDUSTRY (CARPI), 2012, : 1123 - 1126
  • [4] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [5] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    [J]. IEEE ACCESS, 2021, 9 : 105439 - 105450
  • [6] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [7] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [8] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [9] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [10] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816