Weakly supervised text classification framework for noisy-labeled imbalanced

被引:0
|
作者
Zhang, Wenxin [1 ]
Zhou, Yaya [1 ]
Liu, Shuhui [2 ]
Zhang, Yupei [1 ]
Shang, Xuequn
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Shaanxi, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Noisy label; Short-text classification; Weak supervision; Deep learning; Neural networks; Cost-sensitive matrix; ENSEMBLE; SMOTE;
D O I
10.1016/j.neucom.2024.128617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this study is to solve the combined issue of noise labels and imbalanced samples for text classification. Current studies generally adopt data sampling or cleaning in model learning, leading to a part of information loss. To this end, this paper introduces a weakly supervised text classification framework, dubbed WeStcoin, which aims to learn a clear hierarchical attention network directly from the given noisy-labeled imbalanced samples. Specifically, WeStcoin first vectorizes the given texts to generate a contextualized corpus on which the pseudo-label vector is calculated by extracting seed words from each class and the predicted label vector is obtained by a hierarchical attention network. Based on the pseudo and predicted label vectors, we learn a cost-sensitive matrix to project the concatenated label vectors into the given label space. WeStcoin is trained iteratively to reduce the difference between the output labels and the given noisy labels by updating the network parameters, the set of seed words, and the cost-sensitive matrix, respectively. Finally, extended experiments on short-text classification shows that WeStcoin achieves a significant improvement than the stateof-the-art models in imbalanced samples with noisy labels. Besides, WeStcoin acts more robustly than compared methods and provides potential explanations for noisy labels.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition
    Zhang, Weihe
    Wang, Yali
    Qiao, Yu
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7365 - 7374
  • [22] Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification
    Xiao, Huiru
    Liu, Xin
    Song, Yangqiu
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 3370 - 3376
  • [23] Text Generation for Imbalanced Text Classification
    Akkaradamrongrat, Suphamongkol
    Kachamas, Pornpimon
    Sinthupinyo, Sukree
    [J]. 2019 16TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2019), 2019, : 181 - 186
  • [24] A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis
    Liu, Hui
    Liu, Zhenyu
    Jia, Weiqiang
    Zhang, Donghao
    Tan, Jianrong
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (03) : 1583 - 1593
  • [25] WRND: A weighted oversampling framework with relative neighborhood density for imbalanced noisy classification
    Li, Min
    Zhou, Hao
    Liu, Qun
    Gong, Xu
    Wang, Guoyin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [26] EnvBERT: Multi-label Text Classification for Imbalanced, Noisy Environmental News Data
    Kim, Dohyung
    Koo, Jahwan
    Kim, Ung-Mo
    [J]. PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [27] Uncertainty-aware iterative learning for noisy-labeled medical image segmentation
    Hao, Pengyi
    Shi, Kangjian
    Tian, Shuyuan
    Wu, Fuli
    [J]. IET IMAGE PROCESSING, 2023, 17 (13) : 3830 - 3840
  • [28] Pick-and-Learn: Automatic Quality Evaluation for Noisy-Labeled Image Segmentation
    Zhu, Haidong
    Shi, Jialin
    Wu, Ji
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT VI, 2019, 11769 : 576 - 584
  • [29] Self-supervised Attention Model for Weakly Labeled Audio Event Classification
    Kim, Bongjun
    Ghaffarzadegan, Shabnam
    [J]. 2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [30] A Weakly Supervised Learning-Based Oversampling Framework for Class-Imbalanced Fault Diagnosis
    Qian, Min
    Li, Yan-Fu
    [J]. IEEE TRANSACTIONS ON RELIABILITY, 2022, 71 (01) : 429 - 442