Revisiting Neural Networks for Continual Learning: An Architectural Perspective

被引：0

作者：

Lu, Aojun ^{[1
]}

Feng, Tao ^{[3
]}

Yuan, Hangjie ^{[2
]}

Song, Xiaotian ^{[1
]}

Sun, Yanan ^{[1
]}

机构：

[1] Sichuan Univ, Chengdu, Sichuan, Peoples R China

[2] Tsinghua Univ, Beijing, Peoples R China

[3] Zhejiang Univ, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e.g., network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i.e., width and depth, and also at the network components, i.e., skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CLfriendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https://github.com/byyx666/ArchCraft.

引用

页码：4651 / 4659

页数：9

共 50 条

[41] Learning in Artificial Neural Networks: A Statistical Perspective
White, Halbert
NEURAL COMPUTATION, 1989, 1 (04) : 425 - 464
[42] Neural Agents with Continual Learning Capacities
Zhinin-Vera, Luis
Pretel, Elena
Moya, Alejandro
Jimenez-Ruescas, Javier
Astudillo, Jaime
INFORMATION AND COMMUNICATION TECHNOLOGIES, TICEC 2024, 2025, 2273 : 145 - 159
[43] Neural inhibition for continual learning and memory
Barron, Helen C.
CURRENT OPINION IN NEUROBIOLOGY, 2021, 67 : 85 - 94
[44] Continual Learning for Neural Machine Translation
Cao, Yue
Wei, Hao-Ran
Chen, Boxing
Wan, Xiaojun
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3964 - 3974
[45] Dirichlet Prior Networks for Continual Learning
Wiewel, Felix
Bartler, Alexander
Yang, Bin
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[46] Revisiting Browser Performance Benchmarking From an Architectural Perspective
Zhu, Yongye
Wei, Shijia
Tiwari, Mohit
IEEE COMPUTER ARCHITECTURE LETTERS, 2022, 21 (02) : 113 - 116
[47] Rehearsal revealed: The limits and merits of revisiting samples in continual learning
Verwimp, Eli
De lange, Matthias
Tuytelaars, Tinne
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9365 - 9374
[48] Architectural Exploration for On-chip, Online Learning in Spiking Neural Networks
Roy, Subhrajit
Kar, Sougata Kumar
Basu, Arindam
2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 128 - 131
[49] Keep and Learn: Continual Learning by Constraining the Latent Space for Knowledge Preservation in Neural Networks
Kim, Hyo-Eun
Kim, Seungwook
Lee, Jaehwan
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2018, PT I, 2018, 11070 : 520 - 528
[50] Continual and One-Shot Learning Through Neural Networks with Dynamic External Memory
Luders, Benno
Schlager, Mikkel
Korach, Aleksandra
Risi, Sebastian
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 886 - 901

← 1 2 3 4 5 →