A Neural Expectation-Maximization Framework for Noisy Multi-Label Text Classification

被引:2
|
作者
Chen, Junfan [1 ,2 ]
Zhang, Richong [1 ,3 ]
Xu, Jie [4 ]
Hu, Chunming [1 ,3 ,5 ]
Mao, Yongyi [6 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, SKLSDE, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Software, Beijing 100191, Peoples R China
[3] Zhongguancun Lab, Beijing 100190, Peoples R China
[4] Univ Leeds, Leeds LS2 9JT, England
[5] Beihang Univ, Sch Software, Beijing 100191, Peoples R China
[6] Univ Ottawa, Ottawa, ON K1N 6N5, Canada
基金
国家重点研发计划;
关键词
Multi-label text classification; noise label; expectation maximization; neural networks;
D O I
10.1109/TKDE.2022.3223067
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-label text classification (MLTC) has a wide range of real-world applications. Neural networks recently promoted the performance of MLTC models. Training these neural-network models relies on sufficient accurately labelled data. However, manually annotating large-scale multi-label text classification datasets is expensive and impractical for many applications. Weak supervision techniques have thus been developed to reduce the cost of annotating text corpus. However, these techniques introduce noisy labels into the training data and may degrade the model performance. This paper aims to deal with such noise-label problems in MLTC in both single-instance and multi-instance settings. We build a novel Neural Expectation-Maximization Framework (nEM) that combines neural networks with probabilistic modelling. The nEM framework produces text representations using neural-network text encoders and is optimized with the Expectation-Maximization algorithm. It naturally considers the noisy labels during learning by iteratively updating the model parameters and estimating the distribution of the ground-truth labels. We evaluate our nEM framework in multi-instance noisy MLTC on a benchmark relation extraction dataset constructed by distant supervision and in single-instance noisy MLTC on synthetic noisy datasets constructed by keywords supervision and label flipping. The experimental results demonstrate that nEM significantly improves upon baseline models in both single-instance and multi-instance noisy MLTC tasks. The experiment analysis suggests that our nEM framework efficiently reduces the noisy labels in MLTC datasets and significantly improves model performance.
引用
收藏
页码:10992 / 11003
页数:12
相关论文
共 50 条
  • [1] THE NOISY EXPECTATION-MAXIMIZATION ALGORITHM
    Osoba, Osonde
    Mitaim, Sanya
    Kosko, Bart
    [J]. FLUCTUATION AND NOISE LETTERS, 2013, 12 (03):
  • [2] A Neural Architecture for Multi-label Text Classification
    Coope, Sam
    Bachrach, Yoram
    Zukov-Gregoric, Andrej
    Rodriguez, Jose
    Maksak, Bogdan
    McMurtie, Conan
    Bordbar, Mahyar
    [J]. INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 676 - 691
  • [3] Multi-label Text Classification with Deep Neural Networks
    Chen, Yun
    Xiao, Bo
    Lin, Zhiqing
    Dai, Cheng
    Li, Zuochao
    Yang, Liping
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 409 - 413
  • [4] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [5] Regularizing CTC in Expectation-Maximization Framework with Application to Handwritten Text Recognition
    Gao, Likun
    Zhang, Heng
    Li, Cheng-Lin
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [6] Label prompt for multi-label text classification
    Song, Rui
    Liu, Zelong
    Chen, Xingbing
    An, Haining
    Zhang, Zhiqi
    Wang, Xiaoguang
    Xu, Hao
    [J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 8761 - 8775
  • [7] Label prompt for multi-label text classification
    Rui Song
    Zelong Liu
    Xingbing Chen
    Haining An
    Zhiqi Zhang
    Xiaoguang Wang
    Hao Xu
    [J]. Applied Intelligence, 2023, 53 : 8761 - 8775
  • [8] Holistic Label Correction for Noisy Multi-Label Classification
    Xia, Xiaobo
    Deng, Jiankang
    Bao, Wei
    Du, Yuxuan
    Han, Bo
    Shan, Shiguang
    Liu, Tongliang
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1483 - 1493
  • [9] EnvBERT: Multi-label Text Classification for Imbalanced, Noisy Environmental News Data
    Kim, Dohyung
    Koo, Jahwan
    Kim, Ung-Mo
    [J]. PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
  • [10] MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text Classification
    Ye, Hui
    Sunderraman, Rajshekhar
    Ji, Shihao
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4781 - 4793