Accelerating DNN Architecture Search at Scale Using Selective Weight Transfer

被引：4

作者：

Liu, Hongyuan ^{[1
]}

Nicolae, Bogdan ^{[2
]}

Di, Sheng ^{[2
]}

Cappello, Franck ^{[2
]}

Jog, Adwait ^{[1
]}

机构：

[1] William & Mary, Williamsburg, VA 23185 USA

[2] Argonne Natl Lab, Lemont, IL USA

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021) | 2021年

关键词：

Deep Learning; Neural Architecture Search; Checkpointing;

D O I：

10.1109/Cluster48925.2021.00051

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep learning applications are rapidly gaining traction both in industry and scientific computing. Unsurprisingly, there has been significant interest in adopting deep learning at a very large scale on supercomputing infrastructures for a variety of scientific applications. A key issue in this context is how to find an appropriate model architecture that is suitable to solve the problem. We call this the neural architecture search (NAS) problem. Over time, many automated approaches have been proposed that can explore a large number of candidate models. However, this remains a time-consuming and resource expensive process: the candidates are often trained from scratch for a small number of epochs in order to obtain a set of top-K best performers, which are fully trained in a second phase. To address this problem, we propose a novel method that leverages checkpoints of previously discovered candidates to accelerate NAS. Based on the observation that the candidates feature high structural similarity, we propose the idea that new candidates need not be trained starting from random weights, but rather from the weights of similar layers of previously evaluated candidates. Thanks to this approach, the convergence of the candidate models can be significantly accelerated and produces candidates that are statistically better based on the objective metrics. Furthermore, once the top-K models are identified, our approach provides a significant speed-up (1.4 similar to 1.5x on the average) for the full training.

引用

页码：82 / 93

页数：12

共 50 条

[1] EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search
Jiemin FANG
Yukang CHEN
Xinbang ZHANG
Qian ZHANG
Chang HUANG
Gaofeng MENG
Wenyu LIU
Xinggang WANG
Science China(Information Sciences), 2021, 64 (09) : 103 - 115
[2] EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search
Fang, Jiemin
Chen, Yukang
Zhang, Xinbang
Zhang, Qian
Huang, Chang
Meng, Gaofeng
Liu, Wenyu
Wang, Xinggang
SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (09)
[3] EAT-NAS: elastic architecture transfer for accelerating large-scale neural architecture search
Jiemin Fang
Yukang Chen
Xinbang Zhang
Qian Zhang
Chang Huang
Gaofeng Meng
Wenyu Liu
Xinggang Wang
Science China Information Sciences, 2021, 64
[4] Accelerating DNN Training Through Selective Localized Learning
Krithivasan, Sarada
Sen, Sanchari
Venkataramani, Swagath
Raghunathan, Anand
FRONTIERS IN NEUROSCIENCE, 2022, 15
[5] Optimal DNN architecture search using Bayesian Optimization Hyperband for arrhythmia detection
Han, Seungwoo
Eom, Heesang
Kim, Juhyeong
Park, Cheolsoo
2020 IEEE WIRELESS POWER TRANSFER CONFERENCE (WPTC), 2020, : 357 - 360
[6] Accelerating multi-objective neural architecture search by random-weight evaluation
Shengran Hu
Ran Cheng
Cheng He
Zhichao Lu
Jing Wang
Miao Zhang
Complex & Intelligent Systems, 2023, 9 : 1183 - 1192
[7] Accelerating multi-objective neural architecture search by random-weight evaluation
Hu, Shengran
Cheng, Ran
He, Cheng
Lu, Zhichao
Wang, Jing
Zhang, Miao
COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (02) : 1183 - 1192
[8] TENG: A General-Purpose and Efficient Processor Architecture for Accelerating DNN
Zhang, Zekun
Cai, Yujie
Liao, Tianjiao
Xu, Chengyu
Jiao, Xin
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024, 2024, : 149 - 153
[9] ComposableWorkflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification
Charming, Georgia
Patel, Ria
Olaya, Paula
Rorabaugh, Ariel Keller
Miyashita, Osamu
Caino-Lores, Silvina
Schuman, Catherine
Tama, Florence
Taufer, Michela
PROCEEDINGS OF THE 52ND INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2023, 2023, : 756 - 765
[10] A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
Jiang, Yimin
Zhu, Yibo
Lan, Chang
Yi, Bairen
Cui, Yong
Guo, Chuanxiong
PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20), 2020, : 463 - 479

← 1 2 3 4 5 →