Weakly supervised text classification framework for noisy-labeled imbalanced

被引：0

作者：

Zhang, Wenxin ^{[1
]}

Zhou, Yaya ^{[1
]}

Liu, Shuhui ^{[2
]}

Zhang, Yupei ^{[1
]}

Shang, Xuequn

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710129, Shaanxi, Peoples R China

[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 610卷

基金：

中国国家自然科学基金;

关键词：

Imbalanced data; Noisy label; Short-text classification; Weak supervision; Deep learning; Neural networks; Cost-sensitive matrix; ENSEMBLE; SMOTE;

D O I：

10.1016/j.neucom.2024.128617

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of this study is to solve the combined issue of noise labels and imbalanced samples for text classification. Current studies generally adopt data sampling or cleaning in model learning, leading to a part of information loss. To this end, this paper introduces a weakly supervised text classification framework, dubbed WeStcoin, which aims to learn a clear hierarchical attention network directly from the given noisy-labeled imbalanced samples. Specifically, WeStcoin first vectorizes the given texts to generate a contextualized corpus on which the pseudo-label vector is calculated by extracting seed words from each class and the predicted label vector is obtained by a hierarchical attention network. Based on the pseudo and predicted label vectors, we learn a cost-sensitive matrix to project the concatenated label vectors into the given label space. WeStcoin is trained iteratively to reduce the difference between the output labels and the given noisy labels by updating the network parameters, the set of seed words, and the cost-sensitive matrix, respectively. Finally, extended experiments on short-text classification shows that WeStcoin achieves a significant improvement than the stateof-the-art models in imbalanced samples with noisy labels. Besides, WeStcoin acts more robustly than compared methods and provides potential explanations for noisy labels.

引用

页数：12

共 50 条

[11] Deep Learning Network Intensification for Preventing Noisy-Labeled Samples for Remote Sensing Classification
Lin, Chuang
Guo, Shanxin
Chen, Jinsong
Sun, Luyi
Zheng, Xiaorou
Yang, Yan
Xiong, Yingfei
REMOTE SENSING, 2021, 13 (09)
[12] Knowledge Supervised Text Classification with No Labeled Documents
Zhang, Congle
Xue, Gui-Rong
Yu, Yong
PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 509 - +
[13] Performance of Classifiers on Noisy-Labeled Training Data: An Empirical Study on Handwritten Digit Classification Task
Ahmad, Irfan
ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2019, PT II, 2019, 11507 : 414 - 425
[14] MINDFL: Mitigating the Impact of Imbalanced and Noisy-labeled Data in Federated Learning with Quality and Fairness-Aware Client Selection
Zhang, Chaoyu
Wang, Ning
Shi, Shanghao
Du, Changlai
Lou, Wenjing
Hou, Y. Thomas
MILCOM 2023 - 2023 IEEE MILITARY COMMUNICATIONS CONFERENCE, 2023,
[15] Imbalanced Classification Algorithm for Semi Supervised Text Learning (iCASSTLE)
Banerjee, Debanjana
Prabhat, Gyan
Bhowal, Riyanka
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1012 - 1016
[16] Differences Between Hard and Noisy-labeled Samples: An Empirical Study
Forouzesh, Mahsa
Thiran, Patrick
PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 91 - 99
[17] Weakly Supervised Learning Framework Based on k Labeled Samples
Fu Z.
Wang H.-J.
Li T.-R.
Teng F.
Zhang J.
Ruan Jian Xue Bao/Journal of Software, 2020, 31 (04): : 981 - 990
[18] Deep Text Prior: Weakly Supervised Learning for Assertion Classification
Liventsev, Vadim
Fedulova, Irina
Dylov, Dmitry
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 243 - 257
[19] Weakly-supervised Text Classification Based on Keyword Graph
Zhang, Lu
Ding, Jiandong
Xu, Yi
Liu, Yingyao
Zhou, Shuigeng
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 2803 - 2813
[20] MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition
Zhang, Weihe
Wang, Yali
Qiao, Yu
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7365 - 7374

← 1 2 3 4 5 →