H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

被引:4
|
作者
Luo, Yandong [1 ,2 ,3 ]
Yu, Shimeng [1 ]
机构
[1] Georgia Inst Technol, Sch Elect & Comp Engn, 791 Atlantic Dr NW, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Atlanta, GA USA
[3] Apple, Cupertino, CA 95014 USA
关键词
Compute-in-memory; DNN accelerator; heterogeneous 3D integration; multi-head self-attention; transformer; MEMORY SRAM MACRO;
D O I
10.1145/3649219
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Prior hardware accelerator designs primarily focused on single-chip solutions for 10 MB-class computer vi-sion models. The GB-class transformer models for natural language processing (NLP) impose challenges on existing accelerator design due to the massive number of parameters and the diverse matrix multiplication (MatMul) workloads involved. This work proposes a heterogeneous 3D-based accelerator design for trans-former models, which adopts an interposer substrate with multiple 3D memory/logic hybrid cubes optimized for accelerating different MatMul workloads. An approximate computing scheme is proposed to take advan-tage of heterogeneous computing paradigms of mixed-signal compute-in-memory (CIM) and digital tensor processing units (TPU). From the system-level evaluation results, 10 TOPS/W energy efficiency is achieved for the BERT and GPT2 model, which is about 2.6 x similar to 3.1 x higher than the baseline with 7 nm TPU and stacked FeFET memory.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform
    Yuan, Zhiyong
    Si, Weixin
    Liao, Xiangyun
    Duan, Zhaoliang
    Ding, Yihua
    Zhao, Jianhui
    JOURNAL OF SUPERCOMPUTING, 2012, 61 (01): : 84 - 102
  • [32] Parallel computing of 3D smoking simulation based on OpenCL heterogeneous platform
    Zhiyong Yuan
    Weixin Si
    Xiangyun Liao
    Zhaoliang Duan
    Yihua Ding
    Jianhui Zhao
    The Journal of Supercomputing, 2012, 61 : 84 - 102
  • [33] 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment
    Zhu, Ziyu
    Ma, Xiaojian
    Chen, Yixin
    Deng, Zhidong
    Huang, Siyuan
    Li, Qing
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2899 - 2909
  • [34] D3BT: Dynamic 3D Body Transformer for Body Fat Percentage Assessment
    Zheng, Yijiang
    Long, Zhuoxin
    Feng, Boyuan
    Cheng, Ruting
    Vaziri, Khashayar
    Hahn, James K.
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2025, 29 (02) : 848 - 856
  • [35] SAT3D: Slot Attention Transformer for 3D Point Cloud Semantic Segmentation
    Ibrahim, Muhammad
    Akhtar, Naveed
    Anwar, Saeed
    Mian, Ajmal
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (05) : 5456 - 5466
  • [36] Computing in 3D
    Franzon, Paul D.
    Rotenberg, Eric
    Davis, W. Rhett
    Tuck, James
    Davis, W. Rhett
    Zhou, Huiyang
    Schabel, Joshua
    Zhang, Zhenquian
    Dwiel, J. Brandon
    Forbes, Elliott
    JoonmooHuh
    Tshibangu, Marcus
    Lipa, Steve
    2015 INTERNATIONAL 3D SYSTEMS INTEGRATION CONFERENCE (3DIC 2015), 2015,
  • [37] Computing in 3D
    Franzon, Paul
    Rotenberg, Eric
    Tuck, James
    Davis, W. Rhett
    Zhou, Huiyang
    Schabel, Joshua
    Zhang, Zhenquian
    Dwiel, J. Brandon
    Forbes, Elliott
    JoonmooHuh
    Lipa, Steve
    2015 IEEE CUSTOM INTEGRATED CIRCUITS CONFERENCE (CICC), 2015,
  • [38] EPT-Net: Edge Perception Transformer for 3D Medical Image Segmentation
    Yang, Jingyi
    Jiao, Licheng
    Shang, Ronghua
    Liu, Xu
    Li, Ruiyang
    Xu, Longchang
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (11) : 3229 - 3243
  • [39] 3D-TexSeg: Unsupervised Segmentation of 3D Texture Using Mutual Transformer Learning
    Ganapathi, Iyyakutti Iyappan
    Dharejo, Fayaz Ali
    Javed, Sajid
    Ali, Syed Sadaf
    Werghi, Naoufel
    2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 506 - 515
  • [40] 3D Point Cloud Object Detection on Edge Devices for Split Computing
    Noguchi, Taisuke
    Azumi, Takuya
    2024 IEEE 3RD REAL-TIME AND INTELLIGENT EDGE COMPUTING WORKSHOP, RAGE 2024, 2024, : 6 - 11