Weakly supervised text classification framework for noisy-labeled imbalanced

被引:0
|
作者
Zhang, Wenxin [1 ]
Zhou, Yaya [1 ]
Liu, Shuhui [2 ]
Zhang, Yupei [1 ]
Shang, Xuequn
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Shaanxi, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Noisy label; Short-text classification; Weak supervision; Deep learning; Neural networks; Cost-sensitive matrix; ENSEMBLE; SMOTE;
D O I
10.1016/j.neucom.2024.128617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this study is to solve the combined issue of noise labels and imbalanced samples for text classification. Current studies generally adopt data sampling or cleaning in model learning, leading to a part of information loss. To this end, this paper introduces a weakly supervised text classification framework, dubbed WeStcoin, which aims to learn a clear hierarchical attention network directly from the given noisy-labeled imbalanced samples. Specifically, WeStcoin first vectorizes the given texts to generate a contextualized corpus on which the pseudo-label vector is calculated by extracting seed words from each class and the predicted label vector is obtained by a hierarchical attention network. Based on the pseudo and predicted label vectors, we learn a cost-sensitive matrix to project the concatenated label vectors into the given label space. WeStcoin is trained iteratively to reduce the difference between the output labels and the given noisy labels by updating the network parameters, the set of seed words, and the cost-sensitive matrix, respectively. Finally, extended experiments on short-text classification shows that WeStcoin achieves a significant improvement than the stateof-the-art models in imbalanced samples with noisy labels. Besides, WeStcoin acts more robustly than compared methods and provides potential explanations for noisy labels.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] WOT-Class: Weakly Supervised Open-world Text Classification
    Wang, Tianle
    Wang, Zihan
    Liu, Weitang
    Shang, Jingbo
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 2666 - 2675
  • [32] MotifClass: Weakly Supervised Text Classification with Higher-order Metadata Information
    Zhang, Yu
    Garg, Shweta
    Meng, Yu
    Chen, Xiusi
    Han, Jiawei
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 1357 - 1367
  • [33] Topics and Label Propagation: Best of Both Worlds for Weakly Supervised Text Classification
    Pawar, Sachin
    Ramrakhiyani, Nitin
    Hingmire, Swapnil
    Palshikar, Girish K.
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 446 - 459
  • [34] A weakly supervised textual entailment approach to zero-shot text classification
    Pamies, Marc
    Llop, Joan
    Multari, Francesco
    Duran-Silva, Nicolau
    Parra-Rojas, Cesar
    Gonzalez-Agirre, Aitor
    Massucci, Francesco Alessandro
    Villegas, Marta
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 286 - 296
  • [35] A weakly supervised framework for real-world point cloud classification
    Deng, An
    Wu, Yunchao
    Zhang, Peng
    Lu, Zhuheng
    Li, Weiqing
    Su, Zhiyong
    COMPUTERS & GRAPHICS-UK, 2022, 102 : 78 - 88
  • [36] Supervised Microalgae Classification in Imbalanced Dataset
    Correa, Iago
    Drews-, Paulo, Jr.
    de Souza, Marcio Silva
    Tavano, Virginia Maria
    PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 49 - 54
  • [37] A Weakly Supervised Semantic Segmentation Framework for Medium-Resolution Forest Classification With Noisy Labels and GF-1 WFV Images
    Peng, Xueli
    He, Guojin
    Wang, Guizhou
    Yin, Ranyu
    Wang, Jianping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [38] Software quality classification with imbalanced and noisy data
    Folleco, Andres
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    THIRTEENTH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2007, : 191 - +
  • [39] A NOVEL WEAKLY SUPERVISED FRAMEWORK BASED ON NOISY-LABEL LEARNING FOR MEDICAL IMAGE SEGMENTATION
    Wu, Haoyang
    Wang, Huan
    He, Hao
    He, Zixiao
    Wang, Guotai
    2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 1768 - 1772
  • [40] Electrical Fault Diagnosis From Text Data: A Supervised Sentence Embedding Combined With Imbalanced Classification
    Jing, Xiao
    Wu, Zhiang
    Zhang, Lu
    Li, Zhe
    Mu, Dejun
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024, 71 (03) : 3064 - 3073