A Neural Expectation-Maximization Framework for Noisy Multi-Label Text Classification

被引：2

作者：

Chen, Junfan ^{[1
,2
]}

Zhang, Richong ^{[1
,3
]}

Xu, Jie ^{[4
]}

Hu, Chunming ^{[1
,3
,5
]}

Mao, Yongyi ^{[6
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, SKLSDE, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Software, Beijing 100191, Peoples R China

[3] Zhongguancun Lab, Beijing 100190, Peoples R China

[4] Univ Leeds, Leeds LS2 9JT, England

[5] Beihang Univ, Sch Software, Beijing 100191, Peoples R China

[6] Univ Ottawa, Ottawa, ON K1N 6N5, Canada

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 11期

基金：

国家重点研发计划;

关键词：

Multi-label text classification; noise label; expectation maximization; neural networks;

D O I：

10.1109/TKDE.2022.3223067

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-label text classification (MLTC) has a wide range of real-world applications. Neural networks recently promoted the performance of MLTC models. Training these neural-network models relies on sufficient accurately labelled data. However, manually annotating large-scale multi-label text classification datasets is expensive and impractical for many applications. Weak supervision techniques have thus been developed to reduce the cost of annotating text corpus. However, these techniques introduce noisy labels into the training data and may degrade the model performance. This paper aims to deal with such noise-label problems in MLTC in both single-instance and multi-instance settings. We build a novel Neural Expectation-Maximization Framework (nEM) that combines neural networks with probabilistic modelling. The nEM framework produces text representations using neural-network text encoders and is optimized with the Expectation-Maximization algorithm. It naturally considers the noisy labels during learning by iteratively updating the model parameters and estimating the distribution of the ground-truth labels. We evaluate our nEM framework in multi-instance noisy MLTC on a benchmark relation extraction dataset constructed by distant supervision and in single-instance noisy MLTC on synthetic noisy datasets constructed by keywords supervision and label flipping. The experimental results demonstrate that nEM significantly improves upon baseline models in both single-instance and multi-instance noisy MLTC tasks. The experiment analysis suggests that our nEM framework efficiently reduces the noisy labels in MLTC datasets and significantly improves model performance.

引用

页码：10992 / 11003

页数：12

共 50 条

[1] THE NOISY EXPECTATION-MAXIMIZATION ALGORITHM
Osoba, Osonde
Mitaim, Sanya
Kosko, Bart
[J]. FLUCTUATION AND NOISE LETTERS, 2013, 12 (03):
[2] A Neural Architecture for Multi-label Text Classification
Coope, Sam
Bachrach, Yoram
Zukov-Gregoric, Andrej
Rodriguez, Jose
Maksak, Bogdan
McMurtie, Conan
Bordbar, Mahyar
[J]. INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 676 - 691
[3] Multi-label Text Classification with Deep Neural Networks
Chen, Yun
Xiao, Bo
Lin, Zhiqing
Dai, Cheng
Li, Zuochao
Yang, Liping
[J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON NETWORK INFRASTRUCTURE AND DIGITAL CONTENT (IEEE IC-NIDC), 2018, : 409 - 413
[4] An Efficient Framework by Topic Model for Multi-label Text Classification
Sun, Wei
Ran, Xiangying
Luo, Xiangyang
Wang, Chongjun
[J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[5] Regularizing CTC in Expectation-Maximization Framework with Application to Handwritten Text Recognition
Gao, Likun
Zhang, Heng
Li, Cheng-Lin
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[6] Label prompt for multi-label text classification
Song, Rui
Liu, Zelong
Chen, Xingbing
An, Haining
Zhang, Zhiqi
Wang, Xiaoguang
Xu, Hao
[J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 8761 - 8775
[7] Label prompt for multi-label text classification
Rui Song
Zelong Liu
Xingbing Chen
Haining An
Zhiqi Zhang
Xiaoguang Wang
Hao Xu
[J]. Applied Intelligence, 2023, 53 : 8761 - 8775
[8] Holistic Label Correction for Noisy Multi-Label Classification
Xia, Xiaobo
Deng, Jiankang
Bao, Wei
Du, Yuxuan
Han, Bo
Shan, Shiguang
Liu, Tongliang
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1483 - 1493
[9] EnvBERT: Multi-label Text Classification for Imbalanced, Noisy Environmental News Data
Kim, Dohyung
Koo, Jahwan
Kim, Ung-Mo
[J]. PROCEEDINGS OF THE 2021 15TH INTERNATIONAL CONFERENCE ON UBIQUITOUS INFORMATION MANAGEMENT AND COMMUNICATION (IMCOM 2021), 2021,
[10] MatchXML: An Efficient Text-Label Matching Framework for Extreme Multi-Label Text Classification
Ye, Hui
Sunderraman, Rajshekhar
Ji, Shihao
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (09) : 4781 - 4793

← 1 2 3 4 5 →