Watt: A Write-Optimized RRAM-Based Accelerator for Attention

被引:0
|
作者
Zhang, Xuan [1 ]
Song, Zhuoran [1 ]
Li, Xing [1 ]
He, Zhezhi [1 ]
Jing, Naifeng [1 ]
Jiang, Li [1 ]
Liang, Xiaoyao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Resistive random access memory; accelerator; attention; importance; similarity; workload-aware dynamic scheduler;
D O I
10.1007/978-3-031-69766-1_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention-based models, such as Transformer and BERT, have achieved remarkable success across various tasks. However, their deployment is hindered by challenges such as high memory requirements, long inference latency, and significant power consumption. One potential solution for accelerating attention is the utilization of resistive random access memory (RRAM), which exploits process-in-memory (PIM) capability. However, existing RRAM-based accelerators often grapple with costly write operations. Accordingly, we exploit a write-optimized RRAM-based accelerator dubbed Watt for attention-based models, which can further reduce the number of intermediate data written to the RRAM-based crossbars, thereby effectively mitigating workload imbalance among crossbars. Specifically, given the importance and similarity of tokens in the sequences, we design the importance detector and similarity detector to significantly compress the intermediate data K-T and V written to the crossbars. Moreover, due to pruning numerous vectors in K-T and V, the number of vectors written to crossbars varies across different inferences, leading to workload imbalance among crossbars. To tackle this issue, we propose a workload-aware dynamic scheduler comprising a top-k engine and remapping engine. The scheduler first ranks the total write counts of each crossbar and the write counts for each inference using the top-k engine, then assigns inference tasks to the crossbars via the remapping engine. Experimental results show that Watt averagely achieves 6.5x, 4.0x, and 2.1x speedup compared to the state of the art accelerators Sanger, TransPIM, and Retransformer. Meanwhile, it averagely achieves 18.2x, 3.2x, and 2.8x energy savings with respect to the three accelerators.
引用
收藏
页码:107 / 120
页数:14
相关论文
共 50 条
  • [1] R-Accelerator: An RRAM-Based CGRA Accelerator With Logic Contraction
    Chen, Zhengyu
    Zhou, Hai
    Gu, Jie
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (11) : 2655 - 2667
  • [2] Write-Optimized Skip Lists
    Bender, Michael A.
    Farach-Colton, Martin
    Johnson, Rob
    Mauras, Simon
    Mayer, Tyler
    Phillips, Cynthia A.
    Xu, Helen
    PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 69 - 78
  • [3] Write or Not: Programming Scheme Optimization for RRAM-based Neuromorphic Computing
    Meng, Ziqi
    Sun, Yanan
    Qian, Weikang
    PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 985 - 990
  • [4] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
    Bing LI
    Ying QI
    Ying WANG
    Yinhe HAN
    Science China(Information Sciences), 2025, 68 (03) : 371 - 387
  • [5] ODLPIM: A Write-Optimized and Long-Lifetime ReRAM-Based Accelerator for Online Deep Learning
    Zhou, Heng
    Wu, Bing
    Cheng, Huan
    Zhao, Wei
    Wei, Xueliang
    Liu, Jinpeng
    Feng, Dan
    Tong, Wei
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [6] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
    Li, Bing
    Qi, Ying
    Wang, Ying
    Han, Yinhe
    SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
  • [7] Write-Optimized Dynamic Hashing for Persistent Memory
    Nam, Moohyeon
    Cha, Hokeun
    Choi, Young-ri
    Noh, Sam H.
    Nam, Beomseok
    PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2019, : 31 - 44
  • [8] ASBP: Automatic Structured Bit-Pruning for RRAM-based NN Accelerator
    Qu, Songyun
    Li, Bing
    Wang, Ying
    Zhang, Lei
    2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 745 - 750
  • [9] A Universal RRAM-Based DNN Accelerator With Programmable Crossbars Beyond MVM Operator
    Zhang, Zihan
    Jiang, Jianfei
    Zhu, Yongxin
    Wang, Qin
    Mao, Zhigang
    Jing, Naifeng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (07) : 2094 - 2106
  • [10] HyAcc: A Hybrid CAM-MAC RRAM-based Accelerator for Recommendation Model
    Zhang, Xuan
    Song, Zhuoran
    Li, Xing
    He, Zhezhi
    Jiang, Li
    Jing, Naifeng
    Liang, Xiaoyao
    2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 375 - 382