Residual diverse ensemble for long-tailed multi-label text classification

被引:1
|
作者
Shi, Jiangxin [1 ,2 ]
Wei, Tong [3 ,4 ]
Li, Yufeng [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China
[4] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
multi-label learning; extreme multi-label learning; long-tailed distribution; multi-label text classification; ensemble learning;
D O I
10.1007/s11432-022-3915-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the "diverse ensemble" that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail labels. The second is the "error correction" that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the "diverse ensemble" to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble (RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation metrics. Moreover, RDE converges in less than 30 training epochs without increasing the computational overhead.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] A Neural Architecture for Multi-label Text Classification
    Coope, Sam
    Bachrach, Yoram
    Zukov-Gregoric, Andrej
    Rodriguez, Jose
    Maksak, Bogdan
    McMurtie, Conan
    Bordbar, Mahyar
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 676 - 691
  • [42] Multi-label arabic text classification: an overview
    Aljedani N.
    Alotaibi R.
    Taileb M.
    International Journal of Advanced Computer Science and Applications, 2020, 11 (10): : 694 - 706
  • [43] Multi-Label Arabic Text Classification: An Overview
    Aljedani, Nawal
    Alotaibi, Reem
    Taileb, Mounira
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 694 - 706
  • [44] Multi-label Classification of Legislative Text into EuroVoc
    Boella, Guido
    Di Caro, Luigi
    Lesmo, Leonardo
    Daniele, Rispoli
    Robaldo, Livio
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2012), 2012, 250 : 21 - 30
  • [45] Ensemble feature selection for multi-label text classification: An intelligent order statistics approach
    Miri, Mohsen
    Dowlatshahi, Mohammad Bagher
    Hashemi, Amin
    Rafsanjani, Marjan Kuchaki
    Gupta, Brij B.
    Alhalabi, W.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (12) : 11319 - 11341
  • [46] Label-Aware Distribution Calibration for Long-Tailed Classification
    Wang, Chaozheng
    Gao, Shuzheng
    Wang, Pengyun
    Gao, Cuiyun
    Pei, Wenjie
    Pan, Lujia
    Xu, Zenglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6963 - 6975
  • [47] Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-balanced Samplings
    Guo, Hao
    Wang, Song
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15084 - 15093
  • [48] Multi-label Classification of Legal Text with Fusion of Label Relations
    Song Z.
    Li Y.
    Li D.
    Wang S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (02): : 185 - 192
  • [49] Multi-Label Text Classification Based on DistilBERT and Label Correlation
    Wang, Xuyang
    Geng, Liuqing
    Zhang, Xin
    Computer Engineering and Applications, 2024, 60 (23) : 168 - 175
  • [50] Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge
    Holste, Gregory
    Zhou, Yiliang
    Wang, Song
    Jaiswal, Ajay
    Lin, Mingquan
    Zhuge, Sherry
    Yang, Yuzhe
    Kim, Dongkyun
    Nguyen-Mau, Trong-Hieu
    Tran, Minh-Triet
    Jeong, Jaehyup
    Park, Wongi
    Ryu, Jongbin
    Hong, Feng
    Verma, Arsh
    Yamagishi, Yosuke
    Kim, Changhyun
    Seo, Hyeryeong
    Kang, Myungjoo
    Celi, Leo Anthony
    Lu, Zhiyong
    Summers, Ronald M.
    Shih, George
    Wang, Zhangyang
    Peng, Yifan
    MEDICAL IMAGE ANALYSIS, 2024, 97