TransCODE: Co-Design of Transformers and Accelerators for Efficient Training and Inference

被引:2
|
作者
Tuli, Shikhar [1 ]
Jha, Niraj K. [1 ]
机构
[1] Princeton Univ, Dept Elect & Comp Engn, Princeton, NJ 08544 USA
关键词
Application-specific integrated circuits (ASICs); hardware-software co-design; machine learning; neural network accelerators; transformers; ALGORITHM; MODEL;
D O I
10.1109/TCAD.2023.3283443
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automated co-design of machine learning models and evaluation hardware is critical for efficiently deploying such models at scale. Despite the state-of-the-art performance of transformer models, they are not yet ready for execution on resource-constrained hardware platforms. High memory requirements and low parallelizability of the transformer architecture exacerbate this problem. Recently proposed accelerators attempt to optimize the throughput and energy consumption of transformer models. However, such works are either limited to a one-sided search of the model architecture or a restricted set of off-the-shelf devices. Furthermore, previous works only accelerate model inference and not training, which incurs substantially higher memory and compute resources, making the problem even more challenging. To address these limitations, this work proposes a dynamic training framework, called DynaProp, that speeds up the training process and reduces memory consumption. DynaProp is a low-overhead pruning method that prunes activations and gradients at runtime. To effectively execute this method on hardware for a diverse set of transformer architectures, we propose a flexible BERT accelerator, a framework that simulates transformer inference and training on a design space of accelerators. We use this simulator in conjunction with the proposed co-design technique, called TransCODE, to obtain the best-performing models with high accuracy on the given task and minimize latency, energy consumption, and chip area. The obtained transformer-accelerator pair achieves 0.3% higher accuracy than the state-of-the-art pair while incurring 5.2x lower latency and 3.0x lower energy consumption.
引用
收藏
页码:4817 / 4830
页数:14
相关论文
共 50 条
  • [1] SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference
    Haris, Jude
    Gibson, Perry
    Cano, Jose
    Agostini, Nicolas Bohm
    Kaeli, David
    [J]. 2021 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2021), 2021, : 33 - 43
  • [2] CoDA: A Co-Design Framework for Versatile and Efficient Attention Accelerators
    Li, Wenjie
    Hu, Aokun
    Xu, Ningyi
    He, Guanghui
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (08) : 1924 - 1938
  • [3] Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers
    Stevens, Jacob R.
    Venkatesan, Rangharajan
    Dai, Steve
    Khailany, Brucek
    Raghunathan, Anand
    [J]. 2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 469 - 474
  • [4] Bit-Line Computing for CNN Accelerators Co-Design in Edge AI Inference
    Rios, Marco
    Ponzina, Flavio
    Levisse, Alexandre
    Ansaloni, Giovanni
    Atienza, David
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 358 - 372
  • [5] HBP: Hierarchically Balanced Pruning and Accelerator Co-Design for Efficient DNN Inference
    Ren, Ao
    Wang, Yuhao
    Zhang, Tao
    Shi, Jiaxing
    Liu, Duo
    Chen, Xianzhang
    Tan, Yujuan
    Xie, Yuan
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [6] Hardware/Software Co-design for Machine Learning Accelerators
    Chen, Hanqiu
    Hao, Cong
    [J]. 2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 233 - 235
  • [7] Design Intention Inference for Virtual Co-Design Agents
    Law, Matthew, V
    Kwatra, Amritansh
    Dhawan, Nikhil
    Einhorn, Matthew
    Rajesh, Amit
    Hoffman, Guy
    [J]. PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (ACM IVA 2020), 2020,
  • [8] High Performance, Power Efficient Hardware Accelerators: Emerging Devices, Circuits and Architecture Co-design
    Graves, Catherine
    [J]. CF '19 - PROCEEDINGS OF THE 16TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS, 2019, : 1 - 1
  • [9] HW/SW Co-Design of Cost-Efficient CNN Inference for Cognitive IoT
    Lee, Kwangho
    Kong, Joonho
    Munir, Arslan
    [J]. 2020 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING IN DATA SCIENCES (ICDS), 2020,
  • [10] SOLE: Hardware-Software Co-design of Softmax and LayerNorm for Efficient Transformer Inference
    Wang, Wenxun
    Zhou, Shuchang
    Sun, Wenyu
    Sun, Peiqin
    Liu, Yongpan
    [J]. 2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,