Residual diverse ensemble for long-tailed multi-label text classification

被引:1
|
作者
Shi, Jiangxin [1 ,2 ]
Wei, Tong [3 ,4 ]
Li, Yufeng [1 ,2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Peoples R China
[2] Nanjing Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Peoples R China
[4] Southeast Univ, Key Lab Comp Network & Informat Integrat, Minist Educ, Nanjing 210096, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
multi-label learning; extreme multi-label learning; long-tailed distribution; multi-label text classification; ensemble learning;
D O I
10.1007/s11432-022-3915-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Long-tailed multi-label text classification aims to identify a subset of relevant labels from a large candidate label set, where the training datasets usually follow long-tailed label distributions. Many of the previous studies have treated head and tail labels equally, resulting in unsatisfactory performance for identifying tail labels. To address this issue, this paper proposes a novel learning method that combines arbitrary models with two steps. The first step is the "diverse ensemble" that encourages diverse predictions among multiple shallow classifiers, particularly on tail labels, and can improve the generalization of tail labels. The second is the "error correction" that takes advantage of accurate predictions on head labels by the base model and approximates its residual errors for tail labels. Thus, it enables the "diverse ensemble" to focus on optimizing the tail label performance. This overall procedure is called residual diverse ensemble (RDE). RDE is implemented via a single-hidden-layer perceptron and can be used for scaling up to hundreds of thousands of labels. We empirically show that RDE consistently improves many existing models with considerable performance gains on benchmark datasets, especially with respect to the propensity-scored evaluation metrics. Moreover, RDE converges in less than 30 training epochs without increasing the computational overhead.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Long-Tailed Multi-label Retinal Diseases Recognition via Relational Learning and Knowledge Distillation
    Zhou, Qian
    Zou, Hua
    Wang, Zhongyuan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT II, 2022, 13432 : 709 - 718
  • [32] Text-Guided Diverse Image Synthesis for Long-Tailed Remote Sensing Object Classification
    Tang, Haojun
    Zhao, Wenda
    Hu, Guang
    Xiao, Yi
    Li, Yunlong
    Wang, Haipeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [33] Multi-label Random Subspace Ensemble Classification
    Bi, Fan
    Zhu, Jianan
    Feng, Yang
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2024,
  • [34] Dynamic ensemble learning for multi-label classification
    Zhu, Xiaoyan
    Li, Jiaxuan
    Ren, Jingtao
    Wang, Jiayin
    Wang, Guangtao
    INFORMATION SCIENCES, 2023, 623 : 94 - 111
  • [35] LABEL-AWARE TEXT REPRESENTATION FOR MULTI-LABEL TEXT CLASSIFICATION
    Guo, Hao
    Li, Xiangyang
    Zhang, Lei
    Liu, Jia
    Chen, Wei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7728 - 7732
  • [36] ML-FOREST: A Multi-Label Tree Ensemble Method for Multi-Label Classification
    Wu, Qingyao
    Tan, Mingkui
    Song, Hengjie
    Chen, Jian
    Ng, Michael K.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (10) : 2665 - 2680
  • [37] Metalearning Applied to Multi-label Text Classification
    dos Santos, Vania Batista
    de Campos Merschmann, Luiz Henrique
    PROCEEDINGS OF 16TH BRAZILIAN SYMPOSIUM ON INFORMATION SYSTEMS ON DIGITAL TRANSFORMATION AND INNOVATION, SBSI 2020, 2020,
  • [38] All is attention for multi-label text classification
    Liu, Zhi
    Huang, Yunjie
    Xia, Xincheng
    Zhang, Yihao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (02) : 1249 - 1270
  • [39] Scalable Multi-Label Arabic Text Classification
    Ahmed, Nizar A.
    Shehab, Mohammed A.
    Al-Ayyoub, Mahmoud
    Hmeidi, Ismail
    2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2015, : 212 - 217
  • [40] Image to Text Translation by Multi-Label Classification
    Nasierding, Gulisong
    Kouzani, Abbas Z.
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2010, 6216 : 247 - +