Feature Selection Strategy in Text Classification

被引:0
|
作者
Fung, Pui Cheong Gabriel [1 ]
Morstatter, Fred [1 ]
Liu, Huan [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85287 USA
关键词
Feature Selection; Feature Ranking; Text Classification; Selection Strategy;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditionally, the best number of features is determined by the so-called "rule of thumb", or by using a separate validation dataset. We can neither find any explanation why these lead to the best number nor do we have any formal feature selection model to obtain this number. In this paper, we conduct an in-depth empirical analysis and argue that simply selecting the features with the highest scores may not be the best strategy. A highest scores approach will turn many documents into zero length, so that they cannot contribute to the training process. Accordingly, we formulate the feature selection process as a dual objective optimization problem, and identify the best number of features for each document automatically. Extensive experiments are conducted to verify our claims. The encouraging results indicate our proposed framework is effective.
引用
收藏
页码:26 / 37
页数:12
相关论文
共 50 条
  • [21] Effective feature selection technique for text classification
    Seetha, Hari
    Murty, M. Narasimha
    Saravanan, R.
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184
  • [22] A feature selection and classification technique for text categorization
    Girgis, MR
    Aly, AA
    [J]. INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
  • [23] Feature selection improves text classification accuracy
    不详
    [J]. IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 75 - 75
  • [24] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    [J]. Knowledge and Information Systems, 2006, 9 : 468 - 491
  • [25] Composite Feature Extraction and Selection for Text Classification
    Wan, Chuan
    Wang, Yuling
    Liu, Yaoze
    Ji, Jinchao
    Feng, Guozhong
    [J]. IEEE ACCESS, 2019, 7 : 35208 - 35219
  • [26] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [27] Optimal Feature Selection for Imbalanced Text Classification
    Khurana, Anshu
    Verma, Om Prakash
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [28] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491
  • [29] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    [J]. 2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [30] Two new feature selection metrics for text classification
    Sahin, Durmus Ozkan
    Kilic, Erdal
    [J]. AUTOMATIKA, 2019, 60 (02) : 162 - 171