Learning to Weight for Text Classification

被引:0
|
作者
Moreo, Alejandro [1 ]
Esuli, Andrea [1 ]
Sebastiani, Fabrizio [1 ]
机构
[1] CNR, Ist Sci & Tecnol Informaz, I-56124 Pisa, Italy
关键词
Training data; Task analysis; Training; Neural networks; Feature extraction; Time-frequency analysis; Information retrieval; Term weighting; supervised term weighting; text classification; neural networks; deep learning; TERM; SCHEMES;
D O I
10.1109/TKDE.2018.2883446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical that the term weighting function should take into account the distribution (as estimated from training data) of the term across the classes of interest. Although "supervised term weighting" approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article, we analyze the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism, we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimized on the training set of interest; we dub this approach Learning to Weight (LTW). The experiments that we run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.
引用
收藏
页码:302 / 316
页数:15
相关论文
共 50 条
  • [1] Learning to Weight for Text Classification
    Moreo, Alejandro
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32 (02): : 302 - 316
  • [2] Contrastive learning with text augmentation for text classification
    Jia, Ouyang
    Huang, Huimin
    Ren, Jiaxin
    Xie, Luodi
    Xiao, Yinyin
    [J]. APPLIED INTELLIGENCE, 2023, 53 (16) : 19522 - 19531
  • [3] Contrastive learning with text augmentation for text classification
    Ouyang Jia
    Huimin Huang
    Jiaxin Ren
    Luodi Xie
    Yinyin Xiao
    [J]. Applied Intelligence, 2023, 53 : 19522 - 19531
  • [4] Text classification with active learning
    Novak, B
    Mladenic, D
    Grobelnik, M
    [J]. FROM DATA AND INFORMATION ANALYSIS TO KNOWLEDGE ENGINEERING, 2006, : 398 - +
  • [5] Mathematical Analysis on Weight Vectors in Text Classification
    Song, Fengxi
    Chen, Qinglong
    Guo, Zhongwei
    Gao, Xiumei
    [J]. 2012 THIRD GLOBAL CONGRESS ON INTELLIGENT SYSTEMS (GCIS 2012), 2012, : 148 - 151
  • [6] A Curriculum Learning Approach for Multi-Domain Text Classification Using Keyword Weight Ranking
    Yuan, Zilin
    Li, Yinghui
    Li, Yangning
    Zheng, Hai-Tao
    He, Yaobin
    Liu, Wenqiang
    Huang, Dongxiao
    Wu, Bei
    [J]. ELECTRONICS, 2023, 12 (14)
  • [7] Learning Category Distribution for Text Classification
    Wang, Xiangyu
    Zong, Chengqing
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (04)
  • [8] Learning label smoothing for text classification
    Ren, Han
    Zhao, Yajie
    Zhang, Yong
    Sun, Wei
    [J]. PeerJ Computer Science, 2024, 10
  • [9] Active Learning for Turkish Text Classification
    Sapci, Ali Osman Berk
    Tastan, Oznur
    Yeniterzi, Reyyan
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [10] Transfer Learning beyond Text Classification
    Yang, Qiang
    [J]. ADVANCES IN MACHINE LEARNING, PROCEEDINGS, 2009, 5828 : 10 - 22