Watt: A Write-Optimized RRAM-Based Accelerator for Attention

被引：0

作者：

Zhang, Xuan ^{[1
]}

Song, Zhuoran ^{[1
]}

Li, Xing ^{[1
]}

He, Zhezhi ^{[1
]}

Jing, Naifeng ^{[1
]}

Jiang, Li ^{[1
]}

Liang, Xiaoyao ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China

来源：

EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024 | 2024年 / 14802卷

基金：

中国国家自然科学基金;

关键词：

Resistive random access memory; accelerator; attention; importance; similarity; workload-aware dynamic scheduler;

D O I：

10.1007/978-3-031-69766-1_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attention-based models, such as Transformer and BERT, have achieved remarkable success across various tasks. However, their deployment is hindered by challenges such as high memory requirements, long inference latency, and significant power consumption. One potential solution for accelerating attention is the utilization of resistive random access memory (RRAM), which exploits process-in-memory (PIM) capability. However, existing RRAM-based accelerators often grapple with costly write operations. Accordingly, we exploit a write-optimized RRAM-based accelerator dubbed Watt for attention-based models, which can further reduce the number of intermediate data written to the RRAM-based crossbars, thereby effectively mitigating workload imbalance among crossbars. Specifically, given the importance and similarity of tokens in the sequences, we design the importance detector and similarity detector to significantly compress the intermediate data K-T and V written to the crossbars. Moreover, due to pruning numerous vectors in K-T and V, the number of vectors written to crossbars varies across different inferences, leading to workload imbalance among crossbars. To tackle this issue, we propose a workload-aware dynamic scheduler comprising a top-k engine and remapping engine. The scheduler first ranks the total write counts of each crossbar and the write counts for each inference using the top-k engine, then assigns inference tasks to the crossbars via the remapping engine. Experimental results show that Watt averagely achieves 6.5x, 4.0x, and 2.1x speedup compared to the state of the art accelerators Sanger, TransPIM, and Retransformer. Meanwhile, it averagely achieves 18.2x, 3.2x, and 2.8x energy savings with respect to the three accelerators.

引用

页码：107 / 120

页数：14

共 50 条

[1] R-Accelerator: An RRAM-Based CGRA Accelerator With Logic Contraction
Chen, Zhengyu
Zhou, Hai
Gu, Jie
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2019, 27 (11) : 2655 - 2667
[2] Write-Optimized Skip Lists
Bender, Michael A.
Farach-Colton, Martin
Johnson, Rob
Mauras, Simon
Mayer, Tyler
Phillips, Cynthia A.
Xu, Helen
PODS'17: PROCEEDINGS OF THE 36TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2017, : 69 - 78
[3] Write or Not: Programming Scheme Optimization for RRAM-based Neuromorphic Computing
Meng, Ziqi
Sun, Yanan
Qian, Weikang
PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 985 - 990
[4] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
Bing LI
Ying QI
Ying WANG
Yinhe HAN
Science China(Information Sciences), 2025, 68 (03) : 371 - 387
[5] ODLPIM: A Write-Optimized and Long-Lifetime ReRAM-Based Accelerator for Online Deep Learning
Zhou, Heng
Wu, Bing
Cheng, Huan
Zhao, Wei
Wei, Xueliang
Liu, Jinpeng
Feng, Dan
Tong, Wei
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[6] Attar: RRAM-based in-memory attention accelerator with software-hardware co-optimization
Li, Bing
Qi, Ying
Wang, Ying
Han, Yinhe
SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (03)
[7] Write-Optimized Dynamic Hashing for Persistent Memory
Nam, Moohyeon
Cha, Hokeun
Choi, Young-ri
Noh, Sam H.
Nam, Beomseok
PROCEEDINGS OF THE 17TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, 2019, : 31 - 44
[8] ASBP: Automatic Structured Bit-Pruning for RRAM-based NN Accelerator
Qu, Songyun
Li, Bing
Wang, Ying
Zhang, Lei
2021 58TH ACM/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2021, : 745 - 750
[9] A Universal RRAM-Based DNN Accelerator With Programmable Crossbars Beyond MVM Operator
Zhang, Zihan
Jiang, Jianfei
Zhu, Yongxin
Wang, Qin
Mao, Zhigang
Jing, Naifeng
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (07) : 2094 - 2106
[10] HyAcc: A Hybrid CAM-MAC RRAM-based Accelerator for Recommendation Model
Zhang, Xuan
Song, Zhuoran
Li, Xing
He, Zhezhi
Jiang, Li
Jing, Naifeng
Liang, Xiaoyao
2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 375 - 382

← 1 2 3 4 5 →