A novel term weighting scheme for text classification: TF-MONO

被引:23
|
作者
Dogan, Turgut [1 ]
Uysal, Alper Kursat [2 ]
机构
[1] Trakya Univ, Dept Comp Engn, Edirne, Turkey
[2] Eskisehir Tech Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Text classification; Supervised term weighting; Max-occurrence; Non-occurrence; SELECTION; CLASSIFIERS;
D O I
10.1016/j.joi.2020.101076
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The effective representation of the relationship between the documents and their contents is crucial to increase classification performance of text documents in the text classification. Term weighting is a preprocess aiming to represent text documents better in Vector Space by assigning proper weights to terms. Since the calculation of the appropriate weight values directly affects performance of the text classification, in the literature, term weighting is still one of the important sub-research areas of text classification. In this study, we propose a novel term weighting (MONO) strategy which can use the non-occurrence information of terms more effectively than existing term weighting approaches in the literature. The proposed weighting strategy also performs intra-class document scaling to supply better representations of distinguishing capabilities of terms occurring in the different quantity of documents in the same quantity of class. Based on the MONO weighting strategy, two novel supervised term weighting schemes called TF-MONO and SRTF-MONO were proposed for text classification. The proposed schemes were tested with two different classifiers such as SVM and KNN on 3 different datasets named Reuters-21578, 20-Newsgroups, and WebKB. The classification performances of the proposed schemes were compared with 5 different existing term weighting schemes in the literature named TF-IDF, TF-IDF-ICF, TF-RF, TF-IDF-ICSDF, and TF-IGM. The results obtained from 7 different schemes show that SRTFMONO generally outperformed other schemes for all three datasets. Moreover, TF-MONO has promised both Micro-F1 and Macro-F1 results compared to other five benchmark term weighting methods especially on the Reuters-21578 and 20-Newsgroups datasets. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. Informatica (Slovenia), 2022, 46 (02): : 259 - 268
  • [2] A Novel Term Weighting Scheme for Imbalanced Text Classification
    Tantisripreecha, Tanapon
    Soonthornphisaj, Nuanwan
    [J]. INFORMATICA-AN INTERNATIONAL JOURNAL OF COMPUTING AND INFORMATICS, 2022, 46 (02): : 259 - 268
  • [3] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [4] Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports
    Jiang, Zhiying
    Gao, Bo
    He, Yanlin
    Han, Yongming
    Doyle, Paul
    Zhu, Qunxiong
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [5] A Novel Term Weighting Scheme and an Approach for Classification of Agricultural Arabic Text Complaints
    Guru, D. S.
    Ali, Mostafa
    Suhil, Mahamad
    [J]. 2018 IEEE 2ND INTERNATIONAL WORKSHOP ON ARABIC AND DERIVED SCRIPT ANALYSIS AND RECOGNITION (ASAR), 2018, : 24 - 28
  • [6] A Term Weighting Scheme Approach for Vietnamese Text Classification
    Vu Thanh Nguyen
    Nguyen Tri Hai
    Nguyen Hoang Nghia
    Tuan Dinh Le
    [J]. FUTURE DATA AND SECURITY ENGINEERING, FDSE 2015, 2015, 9446 : 46 - 53
  • [7] Turning from TF-IDF to TF-IGM for term weighting in text classification
    Chen, Kewen
    Zhang, Zuping
    Long, Jun
    Zhang, Hao
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 66 : 245 - 260
  • [8] A NOVEL TERM WEIGHTING SCHEME MIDF FOR TEXT CATEGORIZATION
    Deisy, C.
    Gowri, M.
    Baskar, S.
    Kalaiarasi, S. M. A.
    Ramraj, N.
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2010, 5 (01) : 94 - 107
  • [9] A novel term weighting scheme for automated text categorization
    Xu, Hongzhi
    Li, Chunping
    [J]. PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, : 759 - 764
  • [10] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189