Weakly supervised text classification framework for noisy-labeled imbalanced

被引:0
|
作者
Zhang, Wenxin [1 ]
Zhou, Yaya [1 ]
Liu, Shuhui [2 ]
Zhang, Yupei [1 ]
Shang, Xuequn
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Shaanxi, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Noisy label; Short-text classification; Weak supervision; Deep learning; Neural networks; Cost-sensitive matrix; ENSEMBLE; SMOTE;
D O I
10.1016/j.neucom.2024.128617
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of this study is to solve the combined issue of noise labels and imbalanced samples for text classification. Current studies generally adopt data sampling or cleaning in model learning, leading to a part of information loss. To this end, this paper introduces a weakly supervised text classification framework, dubbed WeStcoin, which aims to learn a clear hierarchical attention network directly from the given noisy-labeled imbalanced samples. Specifically, WeStcoin first vectorizes the given texts to generate a contextualized corpus on which the pseudo-label vector is calculated by extracting seed words from each class and the predicted label vector is obtained by a hierarchical attention network. Based on the pseudo and predicted label vectors, we learn a cost-sensitive matrix to project the concatenated label vectors into the given label space. WeStcoin is trained iteratively to reduce the difference between the output labels and the given noisy labels by updating the network parameters, the set of seed words, and the cost-sensitive matrix, respectively. Finally, extended experiments on short-text classification shows that WeStcoin achieves a significant improvement than the stateof-the-art models in imbalanced samples with noisy labels. Besides, WeStcoin acts more robustly than compared methods and provides potential explanations for noisy labels.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Granular Ball Sampling for Noisy Label Classification or Imbalanced Classification
    Xia, Shuyin
    Zheng, Shaoyuan
    Wang, Guoyin
    Gao, Xinbo
    Wang, Binggui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 2144 - 2155
  • [42] Weakly Supervised Multi-Label Classification of Full-Text Scientific Papers
    Zhang, Yu
    Jin, Bowen
    Chen, Xiusi
    Shen, Yanzhen
    Zhang, Yunyi
    Meng, Yu
    Han, Jiawei
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 3458 - 3469
  • [43] Seed Word Selection for Weakly-Supervised Text Classification with Unsupervised Error Estimation
    Jin, Yiping
    Bhatia, Akshay
    Wanvarie, Dittaya
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 112 - 118
  • [44] Weakly-supervised character-level convolutional neural networks for text classification
    Liu, Yongsheng
    Chen, Wenyu
    Niyongabo, Rubungo Andre
    Qu, Hong
    Miao, Kebin
    Wei, Feng
    DEVELOPMENTS OF ARTIFICIAL INTELLIGENCE TECHNOLOGIES IN COMPUTATION AND ROBOTICS, 2020, 12 : 701 - 708
  • [45] Utilizing DTRS for Imbalanced Text Classification
    Zhou, Bing
    Yao, Yiyu
    Liu, Qingzhong
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 219 - 228
  • [46] The Text Classification for Imbalanced Data Sets
    Li, Yanling
    Zhu, Yehang
    Yang, Ping
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 778 - +
  • [47] Label Refinement for Noisy Annotation in Weakly Supervised Segmentation
    Huang, Ziyi
    Liu, Hongshan
    Zhang, Haofeng
    Xing, Fuyong
    Laine, Andrew
    Angelini, Elsa
    Hendon, Christine
    Gan, Yu
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [48] Weakly Supervised Sequence Tagging from Noisy Rules
    Safranchik, Esteban
    Luo, Shiying
    Bach, Stephen H.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5570 - 5578
  • [49] Framework for imbalanced data classification
    Blaszczyk, Mikolaj
    Jedrzejowicz, Joanna
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 3477 - 3486
  • [50] Matrix sketching for supervised classification with imbalanced classes
    Roberta Falcone
    Laura Anderlucci
    Angela Montanari
    Data Mining and Knowledge Discovery, 2022, 36 : 174 - 208