An improved supervised term weighting scheme for text representation and classification

被引:8
|
作者
Tang, Zhong [1 ,2 ]
Li, Wenqiang [1 ,2 ]
Li, Yan [1 ,2 ]
机构
[1] Sichuan Univ, Sch Mech Engn, 24 South Sect,1 Yihuan Rd, Chengdu 610065, Peoples R China
[2] Innovat Method & Creat Design Key Lab Sichuan Pro, Chengdu 610065, Peoples R China
基金
中国国家自然科学基金;
关键词
Supervised term weighting; Text representation; Text classification; Cumulative residual entropy; Proportional distortion function; FEATURE-SELECTION; NAIVE BAYES; FREQUENCY; CATEGORIZATION; FRAMEWORK;
D O I
10.1016/j.eswa.2021.115985
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Term weighting scheme has significant effects on the text classification performance. The main reason is that in text classification tasks, term weighting scheme determines the way in which texts are represented in the vector space model. Currently, term frequency-inverse document frequency is the most widely utilized term weighting scheme but it does not use the available category information of the training texts. Taking this resource of category information (or category factor) into account in the study, an improved supervised term weighting method for representing text is developed, which combines a new measure of information namely cumulative residual entropy and the proportional distortion function. To verify the text classification performance of our proposed scheme, we conducted an extensive experimental comparison of proposed scheme with existing schemes on two corpora (i.e., Reuters-21578 and 20 Newsgroups datasets) with different characteristics. Results explicitly show that our proposed scheme can obtain significantly better effect for text classification than others. Specifically, when linear support vector machine classifier is run, performances were improved to 0.972 and 0.833 (micro-F1) on Reuters-21578 dataset and 20 Newsgroups dataset, respectively.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [2] Supervised term-category feature weighting for improved text classification
    Attieh, Joseph
    Tekli, Joe
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 261
  • [3] Supervised Graph-Based Term Weighting Scheme for Effective Text Classification
    Shanavas, Niloofer
    Wang, Hui
    Lin, Zhiwei
    Hawe, Glenn
    [J]. ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1710 - 1711
  • [4] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    [J]. Knowledge and Information Systems, 2016, 49 : 909 - 931
  • [5] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Haddoud, Mounia
    Mokhtari, Aicha
    Lecroq, Thierry
    Abdeddaim, Said
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) : 909 - 931
  • [6] An improved method of term weighting for text classification
    Jiang, Hua
    Li, Ping
    Hu, Xin
    Wang, Shuyan
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 294 - 298
  • [7] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [8] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560
  • [9] An Improved Term Weighting Scheme for Sentiment Classification
    Zhang, Pu
    Wang, Yinghao
    Wang, Junxia
    Zeng, Xianhua
    Wang, Yong
    [J]. 2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 462 - 466
  • [10] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53