A novel term weighting scheme for text classification: TF-MONO

被引:23
|
作者
Dogan, Turgut [1 ]
Uysal, Alper Kursat [2 ]
机构
[1] Trakya Univ, Dept Comp Engn, Edirne, Turkey
[2] Eskisehir Tech Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Text classification; Supervised term weighting; Max-occurrence; Non-occurrence; SELECTION; CLASSIFIERS;
D O I
10.1016/j.joi.2020.101076
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The effective representation of the relationship between the documents and their contents is crucial to increase classification performance of text documents in the text classification. Term weighting is a preprocess aiming to represent text documents better in Vector Space by assigning proper weights to terms. Since the calculation of the appropriate weight values directly affects performance of the text classification, in the literature, term weighting is still one of the important sub-research areas of text classification. In this study, we propose a novel term weighting (MONO) strategy which can use the non-occurrence information of terms more effectively than existing term weighting approaches in the literature. The proposed weighting strategy also performs intra-class document scaling to supply better representations of distinguishing capabilities of terms occurring in the different quantity of documents in the same quantity of class. Based on the MONO weighting strategy, two novel supervised term weighting schemes called TF-MONO and SRTF-MONO were proposed for text classification. The proposed schemes were tested with two different classifiers such as SVM and KNN on 3 different datasets named Reuters-21578, 20-Newsgroups, and WebKB. The classification performances of the proposed schemes were compared with 5 different existing term weighting schemes in the literature named TF-IDF, TF-IDF-ICF, TF-RF, TF-IDF-ICSDF, and TF-IGM. The results obtained from 7 different schemes show that SRTFMONO generally outperformed other schemes for all three datasets. Moreover, TF-MONO has promised both Micro-F1 and Macro-F1 results compared to other five benchmark term weighting methods especially on the Reuters-21578 and 20-Newsgroups datasets. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] A New Improved Term Weighting Scheme for Text Categorization
    Nguyen Pham Xuan
    Hieu Le Quang
    [J]. KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2013), VOL 1, 2014, 244 : 261 - 270
  • [32] Synonyms Based Term Weighting Scheme: An Extension to TF.IDF
    Kumari, Madhu
    Jain, Akshat
    Bhatia, Ankit
    [J]. TWELFTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2016 / TWELFTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2016 / TWELFTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2016, 2016, 89 : 555 - 561
  • [33] TF ICF TB A new term weighting scheme for clustering dynamic data streams TF ICF TB A new term weighting scheme for clustering dynamic data streams
    Reed, Joel W.
    Jiao, Yu
    Potok, Thomas E.
    Klump, Brian A.
    Elmore, Mark T.
    Hurson, Ali R.
    [J]. ICMLA 2006: 5TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2006, : 258 - +
  • [34] Two novel term weighting for text categorization
    Matsunaga, L. A.
    Ebecken, N. F. F.
    [J]. DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2008, 40 : 105 - 114
  • [35] A Comparative Study on Term Weighting Schemes for Text Classification
    Mazyad, Ahmad
    Teytaud, Fabien
    Fonlupt, Cyril
    [J]. MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 100 - 108
  • [36] RANDOM WALK TERM WEIGHTING FOR IMPROVED TEXT CLASSIFICATION
    Hassan, Samer
    Mihalcea, Rada
    Banea, Carmen
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (04) : 421 - 439
  • [37] A Study on Text Classification: Term Weighting Algorithm Analysis
    Tseng, Kuan-Hua
    Lin, Chun-Hung Richard
    Liu, Jain-Shing
    Huang, Chih-Ming Andrew
    Wang, Yue-Han
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2021, 22 (02): : 311 - 325
  • [38] A Novel Term-weighting Approach in Text Classification over Skewed Data Sets
    Sun, Tieli
    Zhang, Yujie
    Yang, Fengqin
    Yang, Xiquan
    Jiang, Yingjie
    Wang, Zibing
    Li, Kuiwu
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2010, 13 (03): : 621 - 633
  • [39] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [40] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560