Modified DFS-based term weighting scheme for text classification

被引:18
|
作者
Chen, Long [1 ]
Jiang, Liangxiao [1 ,2 ]
Li, Chaoqun [3 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Hubei Key Lab Intelligent Geoinformat Proc, Wuhan 430074, Peoples R China
[3] China Univ Geosci, Sch Math & Phys, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Term weighting; Term frequency; Distinguishing feature selector; STATISTICAL COMPARISONS; CLASSIFIERS;
D O I
10.1016/j.eswa.2020.114438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of textual data on the Internet, text classification (TC) has attracted increasing attention. As a widely used text representation method, the vector space model (VSM) represents the content of a document as a vector composed of term frequency (TF) in the term space. Because different terms have different levels of importance in a document, designing an appropriate term weighting scheme is crucial to improve the performance of TC. In this study, we first conducted a comprehensive survey of the existing well-known term weighting schemes and found that they are not fully effective and that researchers are still focused on proposing new term weighting schemes. To further improve the performance of TC, we propose a new term weighting scheme based on the modified distinguishing feature selector (DFS), which we call TF-MDFS (modified DFS-based TF). Experimental results show that TF-MDFS is overall better than existing state-of-the-art term weighting schemes in terms of the classification accuracy of widely used base classifiers.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [42] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560
  • [43] Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports
    Jiang, Zhiying
    Gao, Bo
    He, Yanlin
    Han, Yongming
    Doyle, Paul
    Zhu, Qunxiong
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [44] A new term-weighting scheme for text classification using the odds of positive and negative class probabilities
    Ko, Youngjoong
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (12) : 2553 - 2565
  • [45] A Text Classification Algorithm based on Feature Weighting
    Yang, Han
    Cui, Honggang
    Tang, Hao
    [J]. GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [46] Random-walk term weighting for improved text classification
    Hassan, Samer
    Mihalcea, Rada
    Banea, Carmen
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 242 - +
  • [47] Emotion Classification of Thai Text based Using Term weighting and Machine Learning Techniques
    Chirawichitchai, Nivet
    [J]. 2014 11TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2014, : 91 - 96
  • [48] An Extension of Topic Models for Text Classification: a Term Weighting Approach
    Lee, Seonggyu
    Kim, Jinho
    Myaeng, Sung-Hyon
    [J]. 2015 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2015, : 217 - 224
  • [49] Improved inverse gravity moment term weighting for text classification
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 130 : 45 - 59
  • [50] Several alternative term weighting methods for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    Zhao, Wu
    Li, Song
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 207