Residual diverse ensemble for long-tailed multi-label text classification

被引：1

作者：

Shi, Jiangxin ^{[1
,2
]}

Wei, Tong ^{[3
,4
]}

Li, Yufeng ^{[1
,2
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China

[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China

[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China

[4] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2024年 / 67卷 / 11期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

multi-label learning; extreme multi-label learning; long-tailed distribution; multi-label text classification; ensemble learning;

D O I：

10.1007/s11432-022-3915-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the "diverse ensemble" that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail labels. The second is the "error correction" that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the "diverse ensemble" to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble (RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation metrics. Moreover, RDE converges in less than 30 training epochs without increasing the computational overhead.

引用

页数：14

共 50 条

[41] A Neural Architecture for Multi-label Text Classification
Coope, Sam
Bachrach, Yoram
Zukov-Gregoric, Andrej
Rodriguez, Jose
Maksak, Bogdan
McMurtie, Conan
Bordbar, Mahyar
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 676 - 691
[42] Multi-label arabic text classification: an overview
Aljedani N.
Alotaibi R.
Taileb M.
International Journal of Advanced Computer Science and Applications, 2020, 11 (10): : 694 - 706
[43] Multi-Label Arabic Text Classification: An Overview
Aljedani, Nawal
Alotaibi, Reem
Taileb, Mounira
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 694 - 706
[44] Multi-label Classification of Legislative Text into EuroVoc
Boella, Guido
Di Caro, Luigi
Lesmo, Leonardo
Daniele, Rispoli
Robaldo, Livio
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2012), 2012, 250 : 21 - 30
[45] Ensemble feature selection for multi-label text classification: An intelligent order statistics approach
Miri, Mohsen
Dowlatshahi, Mohammad Bagher
Hashemi, Amin
Rafsanjani, Marjan Kuchaki
Gupta, Brij B.
Alhalabi, W.
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 11319 - 11341
[46] Label-Aware Distribution Calibration for Long-Tailed Classification
Wang, Chaozheng
Gao, Shuzheng
Wang, Pengyun
Gao, Cuiyun
Pei, Wenjie
Pan, Lujia
Xu, Zenglin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6963 - 6975
[47] Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-balanced Samplings
Guo, Hao
Wang, Song
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15084 - 15093
[48] Multi-label Classification of Legal Text with Fusion of Label Relations
Song Z.
Li Y.
Li D.
Wang S.
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (02): : 185 - 192
[49] Multi-Label Text Classification Based on DistilBERT and Label Correlation
Wang, Xuyang
Geng, Liuqing
Zhang, Xin
Computer Engineering and Applications, 2024, 60 (23) : 168 - 175
[50] Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge
Holste, Gregory
Zhou, Yiliang
Wang, Song
Jaiswal, Ajay
Lin, Mingquan
Zhuge, Sherry
Yang, Yuzhe
Kim, Dongkyun
Nguyen-Mau, Trong-Hieu
Tran, Minh-Triet
Jeong, Jaehyup
Park, Wongi
Ryu, Jongbin
Hong, Feng
Verma, Arsh
Yamagishi, Yosuke
Kim, Changhyun
Seo, Hyeryeong
Kang, Myungjoo
Celi, Leo Anthony
Lu, Zhiyong
Summers, Ronald M.
Shih, George
Wang, Zhangyang
Peng, Yifan
MEDICAL IMAGE ANALYSIS, 2024, 97

← 1 2 3 4 5 →