TransCODE: Co-Design of Transformers and Accelerators for Efficient Training and Inference

被引：2

作者：

Tuli, Shikhar ^{[1
]}

Jha, Niraj K. ^{[1
]}

机构：

[1] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA

来源：

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS | 2023年 / 42卷 / 12期

关键词：

Application-specific integrated circuits (ASICs); hardware-software co-design; machine learning; neural network accelerators; transformers; ALGORITHM; MODEL;

D O I：

10.1109/TCAD.2023.3283443

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automated co-design of machine learning models and evaluation hardware is critical for efficiently deploying such models at scale. Despite the state-of-the-art performance of transformer models, they are not yet ready for execution on resource-constrained hardware platforms. High memory requirements and low parallelizability of the transformer architecture exacerbate this problem. Recently proposed accelerators attempt to optimize the throughput and energy consumption of transformer models. However, such works are either limited to a one-sided search of the model architecture or a restricted set of off-the-shelf devices. Furthermore, previous works only accelerate model inference and not training, which incurs substantially higher memory and compute resources, making the problem even more challenging. To address these limitations, this work proposes a dynamic training framework, called DynaProp, that speeds up the training process and reduces memory consumption. DynaProp is a low-overhead pruning method that prunes activations and gradients at runtime. To effectively execute this method on hardware for a diverse set of transformer architectures, we propose a flexible BERT accelerator, a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models with high accuracy on the given task and minimize latency, energy consumption, and chip area. The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair while incurring 5.2x lower latency and 3.0x lower energy consumption.

引用

页码：4817 / 4830

页数：14

共 50 条

[1] SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference
Haris, Jude
Gibson, Perry
Cano, Jose
Agostini, Nicolas Bohm
Kaeli, David
[J]. 2021 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2021), 2021, : 33 - 43
[2] CoDA: A Co-Design Framework for Versatile and Efficient Attention Accelerators
Li, Wenjie
Hu, Aokun
Xu, Ningyi
He, Guanghui
[J]. IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (08) : 1924 - 1938
[3] Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
Stevens, Jacob R.
Venkatesan, Rangharajan
Dai, Steve
Khailany, Brucek
Raghunathan, Anand
[J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 469 - 474
[4] Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
Rios, Marco
Ponzina, Flavio
Levisse, Alexandre
Ansaloni, Giovanni
Atienza, David
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 358 - 372
[5] HBP: Hierarchically Balanced Pruning and Accelerator Co-Design for Efficient DNN Inference
Ren, Ao
Wang, Yuhao
Zhang, Tao
Shi, Jiaxing
Liu, Duo
Chen, Xianzhang
Tan, Yujuan
Xie, Yuan
[J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[6] Hardware/Software Co-design for Machine Learning Accelerators
Chen, Hanqiu
Hao, Cong
[J]. 2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 233 - 235
[7] Design Intention Inference for Virtual Co-Design Agents
Law, Matthew, V
Kwatra, Amritansh
Dhawan, Nikhil
Einhorn, Matthew
Rajesh, Amit
Hoffman, Guy
[J]. PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (ACM IVA 2020), 2020,
[8] High Performance, Power Efficient Hardware Accelerators: Emerging Devices, Circuits and Architecture Co-design
Graves, Catherine
[J]. CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2019, : 1 - 1
[9] HW/SW Co-Design of Cost-Efficient CNN Inference for Cognitive IoT
Lee, Kwangho
Kong, Joonho
Munir, Arslan
[J]. 2020 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS), 2020,
[10] SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
Wang, Wenxun
Zhou, Shuchang
Sun, Wenyu
Sun, Peiqin
Liu, Yongpan
[J]. 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,

← 1 2 3 4 5 →