Revisiting Neural Networks for Continual Learning: An Architectural Perspective

被引:0
|
作者
Lu, Aojun [1 ]
Feng, Tao [3 ]
Yuan, Hangjie [2 ]
Song, Xiaotian [1 ]
Sun, Yanan [1 ]
机构
[1] Sichuan Univ, Chengdu, Sichuan, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CLfriendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https://github.com/byyx666/ArchCraft.
引用
收藏
页码:4651 / 4659
页数:9
相关论文
共 50 条
  • [41] Learning in Artificial Neural Networks: A Statistical Perspective
    White, Halbert
    NEURAL COMPUTATION, 1989, 1 (04) : 425 - 464
  • [42] Neural Agents with Continual Learning Capacities
    Zhinin-Vera, Luis
    Pretel, Elena
    Moya, Alejandro
    Jimenez-Ruescas, Javier
    Astudillo, Jaime
    INFORMATION AND COMMUNICATION TECHNOLOGIES, TICEC 2024, 2025, 2273 : 145 - 159
  • [43] Neural inhibition for continual learning and memory
    Barron, Helen C.
    CURRENT OPINION IN NEUROBIOLOGY, 2021, 67 : 85 - 94
  • [44] Continual Learning for Neural Machine Translation
    Cao, Yue
    Wei, Hao-Ran
    Chen, Boxing
    Wan, Xiaojun
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3964 - 3974
  • [45] Dirichlet Prior Networks for Continual Learning
    Wiewel, Felix
    Bartler, Alexander
    Yang, Bin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [46] Revisiting Browser Performance Benchmarking From an Architectural Perspective
    Zhu, Yongye
    Wei, Shijia
    Tiwari, Mohit
    IEEE COMPUTER ARCHITECTURE LETTERS, 2022, 21 (02) : 113 - 116
  • [47] Rehearsal revealed: The limits and merits of revisiting samples in continual learning
    Verwimp, Eli
    De lange, Matthias
    Tuytelaars, Tinne
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9365 - 9374
  • [48] Architectural Exploration for On-chip, Online Learning in Spiking Neural Networks
    Roy, Subhrajit
    Kar, Sougata Kumar
    Basu, Arindam
    2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 128 - 131
  • [49] Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks
    Kim, Hyo-Eun
    Kim, Seungwook
    Lee, Jaehwan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT I, 2018, 11070 : 520 - 528
  • [50] Continual and One-Shot Learning Through Neural Networks with Dynamic External Memory
    Luders, Benno
    Schlager, Mikkel
    Korach, Aleksandra
    Risi, Sebastian
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 886 - 901