Supervised term-category feature weighting for improved text classification

被引:9
|
作者
Attieh, Joseph [1 ]
Tekli, Joe [1 ,2 ]
机构
[1] Lebanese Amer Univ LAU, Elect & Comp Engn Dept, Byblos 36, Lebanon
[2] Univ Pay & Pays Adour UPPA, LIUPPA Lab, SPIDER Res Team, F-64600 Anglet, Aquitaine, France
关键词
Text classification; Document and text processing; Feature Engineering; Supervised term weighting; Inverse Category Frequency; TF-IDF; Text representation; SCHEMES; MODEL;
D O I
10.1016/j.knosys.2022.110215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category -based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] An improved supervised term weighting scheme for text representation and classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 189
  • [2] An improved term weighting scheme for text classification
    Tang, Zhong
    Li, Wenqiang
    Li, Yan
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (09):
  • [3] An improved method of term weighting for text classification
    Jiang, Hua
    Li, Ping
    Hu, Xin
    Wang, Shuyan
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 294 - 298
  • [4] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Dogan, Turgut
    Uysal, Alper Kursat
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2019, 44 (11) : 9545 - 9560
  • [5] On Term Frequency Factor in Supervised Term Weighting Schemes for Text Classification
    Turgut Dogan
    Alper Kursat Uysal
    [J]. Arabian Journal for Science and Engineering, 2019, 44 : 9545 - 9560
  • [6] RANDOM WALK TERM WEIGHTING FOR IMPROVED TEXT CLASSIFICATION
    Hassan, Samer
    Mihalcea, Rada
    Banea, Carmen
    [J]. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2007, 1 (04) : 421 - 439
  • [7] Supervised Contrastive Learning with Term Weighting for Improving Chinese Text Classification
    Guo, Jiabao
    Zhao, Bo
    Liu, Hui
    Liu, Yifan
    Zhong, Qian
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2023, 28 (01) : 59 - 68
  • [8] Structure-Based Supervised Term Weighting and Regularization for Text Classification
    Shanavas, Niloofer
    Wang, Hui
    Lin, Zhiwei
    Hawe, Glenn
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2019), 2019, 11608 : 105 - 117
  • [9] Inverse-Category-Frequency Based Supervised Term Weighting Schemes for Text Categorization
    Wang, Deqing
    Zhang, Hui
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (02) : 209 - 225
  • [10] Random-walk term weighting for improved text classification
    Hassan, Samer
    Mihalcea, Rada
    Banea, Carmen
    [J]. ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 242 - +