Weakly supervised text classification framework for noisy-labeled imbalanced

被引:0
|
作者
Zhang, Wenxin [1 ]
Zhou, Yaya [1 ]
Liu, Shuhui [2 ]
Zhang, Yupei [1 ]
Shang, Xuequn
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Shaanxi, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Noisy label; Short-text classification; Weak supervision; Deep learning; Neural networks; Cost-sensitive matrix; ENSEMBLE; SMOTE;
D O I
10.1016/j.neucom.2024.128617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this study is to solve the combined issue of noise labels and imbalanced samples for text classification. Current studies generally adopt data sampling or cleaning in model learning, leading to a part of information loss. To this end, this paper introduces a weakly supervised text classification framework, dubbed WeStcoin, which aims to learn a clear hierarchical attention network directly from the given noisy-labeled imbalanced samples. Specifically, WeStcoin first vectorizes the given texts to generate a contextualized corpus on which the pseudo-label vector is calculated by extracting seed words from each class and the predicted label vector is obtained by a hierarchical attention network. Based on the pseudo and predicted label vectors, we learn a cost-sensitive matrix to project the concatenated label vectors into the given label space. WeStcoin is trained iteratively to reduce the difference between the output labels and the given noisy labels by updating the network parameters, the set of seed words, and the cost-sensitive matrix, respectively. Finally, extended experiments on short-text classification shows that WeStcoin achieves a significant improvement than the stateof-the-art models in imbalanced samples with noisy labels. Besides, WeStcoin acts more robustly than compared methods and provides potential explanations for noisy labels.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Topic Labeled Text Classification: A Weakly Supervised Approach
    Hingmire, Swapnil
    Chakraborti, Sutanu
    [J]. SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 385 - 394
  • [2] Drop Loss for Person Attribute Recognition With Imbalanced Noisy-Labeled Samples
    Yan, Yan
    Xu, Youze
    Xue, Jing-Hao
    Lu, Yang
    Wang, Hanzi
    Zhu, Wentao
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (11) : 7071 - 7084
  • [3] WeStcoin: Weakly-Supervised Contextualized Text Classification with Imbalance and Noisy Labels
    Zhang, Yupei
    Zhou, Yaya
    Liu, Shuhui
    Zhang, Wenxin
    Xiao, Min
    Shang, Xuequn
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2451 - 2457
  • [4] Meta joint optimization: a holistic framework for noisy-labeled visual recognition
    Jialin Shi
    Zheng Cao
    Ji Wu
    [J]. Applied Intelligence, 2022, 52 : 875 - 888
  • [5] Meta joint optimization: a holistic framework for noisy-labeled visual recognition
    Shi, Jialin
    Cao, Zheng
    Wu, Ji
    [J]. APPLIED INTELLIGENCE, 2022, 52 (01) : 875 - 888
  • [6] Noisy-Labeled NER with Confidence Estimation
    Liu, Kun
    Fu, Yao
    Tan, Chuanqi
    Chen, Mosha
    Zhang, Ningyu
    Huang, Songfang
    Gao, Sheng
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3437 - 3445
  • [7] Hyperspectral Images Weakly Supervised Classification with Noisy Labels
    Liu, Chengyang
    Zhao, Lin
    Wu, Haibin
    [J]. REMOTE SENSING, 2023, 15 (20)
  • [8] Weakly-Supervised Hierarchical Text Classification
    Meng, Yu
    Shen, Jiaming
    Zhang, Chao
    Han, Jiawei
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6826 - 6833
  • [9] Sprinkling Topics for Weakly Supervised Text Classification
    Hingmire, Swapnil
    Chakraborti, Sutanu
    [J]. PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 55 - 60
  • [10] Weakly-Supervised Neural Text Classification
    Meng, Yu
    Shen, Jiaming
    Zhang, Chao
    Han, Jiawei
    [J]. CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 983 - 992