Modified DFS-based term weighting scheme for text classification

被引:18
|
作者
Chen, Long [1 ]
Jiang, Liangxiao [1 ,2 ]
Li, Chaoqun [3 ]
机构
[1] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Hubei Key Lab Intelligent Geoinformat Proc, Wuhan 430074, Peoples R China
[3] China Univ Geosci, Sch Math & Phys, Wuhan 430074, Peoples R China
基金
中国国家自然科学基金;
关键词
Text classification; Term weighting; Term frequency; Distinguishing feature selector; STATISTICAL COMPARISONS; CLASSIFIERS;
D O I
10.1016/j.eswa.2020.114438
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the rapid growth of textual data on the Internet, text classification (TC) has attracted increasing attention. As a widely used text representation method, the vector space model (VSM) represents the content of a document as a vector composed of term frequency (TF) in the term space. Because different terms have different levels of importance in a document, designing an appropriate term weighting scheme is crucial to improve the performance of TC. In this study, we first conducted a comprehensive survey of the existing well-known term weighting schemes and found that they are not fully effective and that researchers are still focused on proposing new term weighting schemes. To further improve the performance of TC, we propose a new term weighting scheme based on the modified distinguishing feature selector (DFS), which we call TF-MDFS (modified DFS-based TF). Experimental results show that TF-MDFS is overall better than existing state-of-the-art term weighting schemes in terms of the classification accuracy of widely used base classifiers.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [2] A simple probability based term weighting scheme for automated text classification
    Liu, Ying
    Loh, Han Tong
    [J]. NEW TRENDS IN APPLIED ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2007, 4570 : 33 - +
  • [3] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53
  • [4] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [5] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (02): : 259 - 268
  • [6] Modified frequency-based term weighting schemes for text classification
    Sabbah, Thabit
    Selamat, Ali
    Selamat, Md Hafiz
    Al-Anzi, Fawaz S.
    Viedma, Enrique Herrera
    Krejcar, Ondrej
    Fujita, Hamido
    [J]. APPLIED SOFT COMPUTING, 2017, 58 : 193 - 206
  • [7] Supervised Graph-Based Term Weighting Scheme for Effective Text Classification
    Shanavas, Niloofer
    Wang, Hui
    Lin, Zhiwei
    Hawe, Glenn
    [J]. ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1710 - 1711
  • [8] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
  • [9] A probabilistic model derived term weighting scheme for text classification
    Feng, Guozhong
    Li, Shaoting
    Sun, Tieli
    Zhang, Bangzuo
    [J]. PATTERN RECOGNITION LETTERS, 2018, 110 : 23 - 29
  • [10] Using modified term frequency to improve term weighting for text classification
    Chen, Long
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 101