Feature selection based on term frequency deviation rate for text classification

被引:0
|
作者
Hongfang Zhou
Yiming Ma
Xiang Li
机构
[1] Xi’an University of Technology,School of Computer Science and Engineering
来源
Applied Intelligence | 2021年 / 51卷
关键词
Text classification; Feature selection; Term frequency; Document frequency; Deviation ratio;
D O I
暂无
中图分类号
学科分类号
摘要
Feature selection is a technique to select a subset of the most relevant features for modeling training. In this paper, a new concept of TDR is firstly proposed to improve the classification accuracy. Then, a TDR-based algorithm for text classification is advanced. Finally, the extensive experiments are made on seven datasets (K1a, K1b, WAP, R52, R8, 20NewGroups, and Cade12) for two classifiers of Naive Bayes and Support Vector Machine. The experimental results indicate that the new approach can improve the classification accuracy by an average percent of 7.9%.
引用
收藏
页码:3255 / 3274
页数:19
相关论文
共 50 条
  • [1] Feature selection based on term frequency deviation rate for text classification
    Zhou, Hongfang
    Ma, Yiming
    Li, Xiang
    [J]. APPLIED INTELLIGENCE, 2021, 51 (06) : 3255 - 3274
  • [2] Feature selection based on absolute deviation factor for text classification
    Jin, Lingbin
    Zhang, Li
    Zhao, Lei
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [3] OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    [J]. COMPUTING AND INFORMATICS, 2020, 39 (05) : 881 - 906
  • [4] Optimal feature subset selection based on combining document frequency and term frequency for text classification
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    [J]. Computing and Informatics, 2021, 39 (05) : 881 - 906
  • [5] Feature selection based on long short term memory for text classification
    Ming Hong
    Heyong Wang
    [J]. Multimedia Tools and Applications, 2024, 83 : 44333 - 44378
  • [6] Feature selection based on long short term memory for text classification
    Hong, Ming
    Wang, Heyong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44333 - 44378
  • [7] Relative term-frequency based feature selection for text categorization
    Yang, SM
    Wu, XB
    Deng, ZH
    Zhang, M
    Yang, DQ
    [J]. 2002 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-4, PROCEEDINGS, 2002, : 1432 - 1436
  • [8] Weighted Document Frequency for Feature Selection in Text Classification
    Li, Baoli
    Yan, Qiuling
    Xu, Zhenqiang
    Wang, Guicai
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 132 - 135
  • [9] Comparison of term frequency and document frequency based feature selection metrics in text categorization
    Azam, Nouman
    Yao, JingTao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4760 - 4768
  • [10] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780