Weighted Document Frequency for Feature Selection in Text Classification

被引:0
|
作者
Li, Baoli [1 ]
Yan, Qiuling [1 ]
Xu, Zhenqiang [1 ]
Wang, Guicai [1 ]
机构
[1] Henan Univ Technol, Coll Informat Sci & Engn, Zhengzhou, Peoples R China
关键词
Document Frequency; Weighted Document Frequency; Feature Selection; Text Classification; Text Categorization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the past research, Document Frequency (DF) has been validated to be a simple yet quite effective measure for feature selection in text classification. The calculation is based on how many documents in a collection contain a feature, which can be a word, a phrase, a n-gram, or a specially derived attribute. The counting process takes a binary strategy: if a feature appears in a document, its DF will be increased by one. This traditional DF metric concerns only about whether a feature appears in a document, but does not consider how important the feature is in that document. Obviously, thus counted document frequency is very likely to introduce much noise. Therefore, a weighted document frequency (WDF) is proposed and expected to reduce such noise to some extent. Extensive experiments on two text classification data sets demonstrate the effectiveness of the proposed measure.
引用
收藏
页码:132 / 135
页数:4
相关论文
共 50 条
  • [41] Optimal Feature Selection for Imbalanced Text Classification
    Khurana, Anshu
    Verma, Om Prakash
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [42] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491
  • [43] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [44] Document Classification with a weighted Frequency Pattern tree algorithm
    Dsouza, Froila Helixia
    Ananthanarayana, V. S.
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA MINING AND ADVANCED COMPUTING (SAPIENCE), 2016, : 29 - 34
  • [45] Interactions between document representation and feature selection in text categorization
    Radovanovic, Milos
    Ivanovic, Mirjana
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, 4080 : 489 - 498
  • [46] A COMBINED APPROACH FOR FILTER FEATURE SELECTION IN DOCUMENT CLASSIFICATION
    Le Nguyen Hoai Nam
    Ho Bao Quoc
    [J]. 2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 317 - 324
  • [47] Hybrid Feature Selection for Amharic News Document Classification
    Endalie, Demeke
    Haile, Getamesay
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [48] A Weighted Classification Method Based on Adaptive Feature Selection
    Ni, Ruizheng
    Qiu, Ruichang
    Luo, Zhiwei
    Chen, Jie
    Jin, Zheming
    Liu, Zhigang
    [J]. IEEE ACCESS, 2022, 10 : 58635 - 58646
  • [49] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386
  • [50] A Comparative Study on Feature Selection in Unbalance Text Classification
    Xu, Yan
    [J]. 2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 44 - 47