Weakly supervised text classification framework for noisy-labeled imbalanced

被引:0
|
作者
Zhang, Wenxin [1 ]
Zhou, Yaya [1 ]
Liu, Shuhui [2 ]
Zhang, Yupei [1 ]
Shang, Xuequn
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Shaanxi, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Noisy label; Short-text classification; Weak supervision; Deep learning; Neural networks; Cost-sensitive matrix; ENSEMBLE; SMOTE;
D O I
10.1016/j.neucom.2024.128617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this study is to solve the combined issue of noise labels and imbalanced samples for text classification. Current studies generally adopt data sampling or cleaning in model learning, leading to a part of information loss. To this end, this paper introduces a weakly supervised text classification framework, dubbed WeStcoin, which aims to learn a clear hierarchical attention network directly from the given noisy-labeled imbalanced samples. Specifically, WeStcoin first vectorizes the given texts to generate a contextualized corpus on which the pseudo-label vector is calculated by extracting seed words from each class and the predicted label vector is obtained by a hierarchical attention network. Based on the pseudo and predicted label vectors, we learn a cost-sensitive matrix to project the concatenated label vectors into the given label space. WeStcoin is trained iteratively to reduce the difference between the output labels and the given noisy labels by updating the network parameters, the set of seed words, and the cost-sensitive matrix, respectively. Finally, extended experiments on short-text classification shows that WeStcoin achieves a significant improvement than the stateof-the-art models in imbalanced samples with noisy labels. Besides, WeStcoin acts more robustly than compared methods and provides potential explanations for noisy labels.
引用
收藏
页数:12
相关论文
共 50 条
  • [11] Deep Learning Network Intensification for Preventing Noisy-Labeled Samples for Remote Sensing Classification
    Lin, Chuang
    Guo, Shanxin
    Chen, Jinsong
    Sun, Luyi
    Zheng, Xiaorou
    Yang, Yan
    Xiong, Yingfei
    [J]. REMOTE SENSING, 2021, 13 (09)
  • [12] Knowledge Supervised Text Classification with No Labeled Documents
    Zhang, Congle
    Xue, Gui-Rong
    Yu, Yong
    [J]. PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 509 - +
  • [13] Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task
    Ahmad, Irfan
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2019, PT II, 2019, 11507 : 414 - 425
  • [14] Imbalanced Classification Algorithm for Semi Supervised Text Learning (iCASSTLE)
    Banerjee, Debanjana
    Prabhat, Gyan
    Bhowal, Riyanka
    [J]. 2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1012 - 1016
  • [15] MINDFL: Mitigating the Impact of Imbalanced and Noisy-labeled Data in Federated Learning with Quality and Fairness-Aware Client Selection
    Zhang, Chaoyu
    Wang, Ning
    Shi, Shanghao
    Du, Changlai
    Lou, Wenjing
    Hou, Y. Thomas
    [J]. MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE, 2023,
  • [16] Differences Between Hard and Noisy-labeled Samples: An Empirical Study
    Forouzesh, Mahsa
    Thiran, Patrick
    [J]. PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 91 - 99
  • [17] Weakly Supervised Learning Framework Based on k Labeled Samples
    Fu, Zhi
    Wang, Hong-Jun
    Li, Tian-Rui
    Teng, Fei
    Zhang, Ji
    [J]. Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 981 - 990
  • [18] Deep Text Prior: Weakly Supervised Learning for Assertion Classification
    Liventsev, Vadim
    Fedulova, Irina
    Dylov, Dmitry
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 243 - 257
  • [19] Weakly-supervised Text Classification Based on Keyword Graph
    Zhang, Lu
    Ding, Jiandong
    Xu, Yi
    Liu, Yingyao
    Zhou, Shuigeng
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2803 - 2813
  • [20] A Submodular Optimization Framework for Imbalanced Text Classification With Data Augmentation
    Alemayehu, Eyor
    Fang, Yi
    [J]. IEEE ACCESS, 2023, 11 : 41680 - 41696