Weighted Document Frequency for Feature Selection in Text Classification

被引:0
|
作者
Li, Baoli [1 ]
Yan, Qiuling [1 ]
Xu, Zhenqiang [1 ]
Wang, Guicai [1 ]
机构
[1] Henan Univ Technol, Coll Informat Sci & Engn, Zhengzhou, Peoples R China
关键词
Document Frequency; Weighted Document Frequency; Feature Selection; Text Classification; Text Categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past research, Document Frequency (DF) has been validated to be a simple yet quite effective measure for feature selection in text classification. The calculation is based on how many documents in a collection contain a feature, which can be a word, a phrase, a n-gram, or a specially derived attribute. The counting process takes a binary strategy: if a feature appears in a document, its DF will be increased by one. This traditional DF metric concerns only about whether a feature appears in a document, but does not consider how important the feature is in that document. Obviously, thus counted document frequency is very likely to introduce much noise. Therefore, a weighted document frequency (WDF) is proposed and expected to reduce such noise to some extent. Extensive experiments on two text classification data sets demonstrate the effectiveness of the proposed measure.
引用
收藏
页码:132 / 135
页数:4
相关论文
共 50 条
  • [1] OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    [J]. COMPUTING AND INFORMATICS, 2020, 39 (05) : 881 - 906
  • [2] Optimal feature subset selection based on combining document frequency and term frequency for text classification
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    [J]. Computing and Informatics, 2021, 39 (05) : 881 - 906
  • [3] Importance Weighted Feature Selection Strategy for Text Classification
    Li, Baoli
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, : 344 - 347
  • [4] Improved Document Feature Selection with Categorical Parameter for Text Classification
    Wang, Fen
    Li, Xiaoxuan
    Huang, Xiaotao
    Kang, Ling
    [J]. MOBILE, SECURE, AND PROGRAMMABLE NETWORKING (MSPN 2016), 2016, 10026 : 86 - 98
  • [5] An extended document frequency metric for feature selection in text categorization
    Xu, Yan
    Wang, Bin
    Li, JinTao
    Jing, Hongfang
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 71 - +
  • [6] Traditional and Swarm Intelligent Based Text Feature Selection for Document Classification
    Kyaw, Khin Sandar
    Limsiroratana, Somchai
    [J]. ISCIT 2019: PROCEEDINGS OF 2019 19TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2019, : 226 - 231
  • [7] Comparison of term frequency and document frequency based feature selection metrics in text categorization
    Azam, Nouman
    Yao, JingTao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (05) : 4760 - 4768
  • [8] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [9] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
    Endalie, Demeke
    Haile, Getamesay
    Abebe, Wondmagegn Taye
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [10] Feature selection based on term frequency deviation rate for text classification
    Hongfang Zhou
    Yiming Ma
    Xiang Li
    [J]. Applied Intelligence, 2021, 51 : 3255 - 3274