InfoCL: Alleviating Catastrophic Forgetting in Continual Text Classification from An Information Theoretic Perspective

被引:0
|
作者
Song, Yifan [1 ]
Wang, Peiyi [1 ]
Xiong, Weimin [1 ]
Zhu, Dawei [1 ]
Liu, Tianyu [2 ]
Sui, Zhifang [1 ]
Li, Sujian [1 ]
机构
[1] Peking Univ, Sch Comp Sci, Natl Key Lab Multimedia Informat Proc, Beijing, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Continual learning (CL) aims to constantly learn new knowledge over time while avoiding catastrophic forgetting on old tasks. We focus on continual text classification under the class-incremental setting. Recent CL studies have identified the severe performance decrease on analogous classes as a key factor for catastrophic forgetting. In this paper, through an in-depth exploration of the representation learning process in CL, we discover that the compression effect of the information bottleneck leads to confusion on analogous classes. To enable the model learn more sufficient representations, we propose a novel replay-based continual text classification method, InfoCL. Our approach utilizes fast-slow and current-past contrastive learning to perform mutual information maximization and better recover the previously learned representations. In addition, InfoCL incorporates an adversarial memory augmentation strategy to alleviate the overfitting problem of replay. Experimental results demonstrate that InfoCL effectively mitigates forgetting and achieves state-of-the-art performance on three text classification tasks. The code is publicly available at https://github. com/Yifan-Song793/InfoCL.
引用
收藏
页码:14557 / 14570
页数:14
相关论文
共 45 条
  • [1] Episodic memory based continual learning without catastrophic forgetting for environmental sound classification
    Karam S.
    Ruan S.-J.
    Haq Q.M.
    Li L.P.-H.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (04) : 4439 - 4449
  • [2] Information Theoretic Text Classification Methods Evaluation
    Coutinho, David Pereira
    Figueiredo, Mario A. T.
    PATTERN RECOGNITION IN INFORMATION SYSTEMS, PROCEEDINGS, 2008, : 77 - +
  • [3] Continual Learning for Text Classification with Information Disentanglement Based Regularization
    Huang, Yufan
    Zhang, Yanzhe
    Chen, Jiaao
    Wang, Xuezhi
    Yang, Diyi
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2736 - 2746
  • [4] Information-theoretic feature selection algorithms for text classification
    Novovicová, J
    Malík, A
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 3272 - 3277
  • [5] Information theoretic text classification using the Ziv-Merhav method
    Coutinho, DP
    Figueiredo, MAT
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 2, PROCEEDINGS, 2005, 3523 : 355 - 362
  • [6] A divisive information-theoretic feature clustering algorithm for text classification
    Dhillon, Inderjit S.
    Mallela, Subramanyam
    Kumar, Rahul
    Journal of Machine Learning Research, 2003, 3 : 1265 - 1287
  • [7] Understanding Adjustment from an Information Theoretic Perspective
    Chang G.
    Zhang S.
    Liu Z.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2024, 49 (02): : 313 - 323
  • [8] Disentangled Text Representation Learning With Information-Theoretic Perspective for Adversarial Robustness
    Zhao, Jiahao
    Mao, Wenji
    Zeng, Daniel Dajun
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1237 - 1247
  • [9] Unsupervised classification via decision trees: An information-theoretic perspective
    Karakos, D
    Khudanpur, S
    Eisner, J
    Priebe, CE
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 1081 - 1084
  • [10] Haptic media from an information-theoretic perspective
    Moustakas, Konstantinos
    2013 IEEE INTERNATIONAL SYMPOSIUM ON HAPTIC AUDIO-VISUAL ENVIRONMENTS AND GAMES (HAVE 2013), 2013, : 81 - 86