Watt: A Write-Optimized RRAM-Based Accelerator for Attention

被引：0

作者：

Zhang, Xuan ^{[1
]}

Song, Zhuoran ^{[1
]}

Li, Xing ^{[1
]}

He, Zhezhi ^{[1
]}

Jing, Naifeng ^{[1
]}

Jiang, Li ^{[1
]}

Liang, Xiaoyao ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China

来源：

EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024 | 2024年 / 14802卷

基金：

中国国家自然科学基金;

关键词：

Resistive random access memory; accelerator; attention; importance; similarity; workload-aware dynamic scheduler;

D O I：

10.1007/978-3-031-69766-1_8

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attention-based models, such as Transformer and BERT, have achieved remarkable success across various tasks. However, their deployment is hindered by challenges such as high memory requirements, long inference latency, and significant power consumption. One potential solution for accelerating attention is the utilization of resistive random access memory (RRAM), which exploits process-in-memory (PIM) capability. However, existing RRAM-based accelerators often grapple with costly write operations. Accordingly, we exploit a write-optimized RRAM-based accelerator dubbed Watt for attention-based models, which can further reduce the number of intermediate data written to the RRAM-based crossbars, thereby effectively mitigating workload imbalance among crossbars. Specifically, given the importance and similarity of tokens in the sequences, we design the importance detector and similarity detector to significantly compress the intermediate data K-T and V written to the crossbars. Moreover, due to pruning numerous vectors in K-T and V, the number of vectors written to crossbars varies across different inferences, leading to workload imbalance among crossbars. To tackle this issue, we propose a workload-aware dynamic scheduler comprising a top-k engine and remapping engine. The scheduler first ranks the total write counts of each crossbar and the write counts for each inference using the top-k engine, then assigns inference tasks to the crossbars via the remapping engine. Experimental results show that Watt averagely achieves 6.5x, 4.0x, and 2.1x speedup compared to the state of the art accelerators Sanger, TransPIM, and Retransformer. Meanwhile, it averagely achieves 18.2x, 3.2x, and 2.8x energy savings with respect to the three accelerators.

引用

页码：107 / 120

页数：14

共 50 条

[31] PMEH: A Parallel and Write-Optimized Extendible Hashing for Persistent Memory
Hu, Jing
Chen, Jianxi
Zhu, Yifeng
Yang, Qing
Peng, Zhouxuan
Yu, Ya
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 3801 - 3814
[32] Endurance Enhancement of Write-Optimized STT-RAM Caches
Saraf, Puneet
Mutyam, Madhu
MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 101 - 113
[33] Tracking Network Events with Write Optimized Data Structures The Design and Implementation of TWIAD: The Write-Optimized IP Address Database
Donoghue, Nolan P.
Hahn, Bridger
Xu, Helen
Kroeger, Thomas M.
Zage, David
Johnson, Rob
2015 4TH INTERNATIONAL WORKSHOP ON BUILDING ANALYSIS DATASETS AND GATHERING EXPERIENCE RETURNS FOR SECURITY (BADGERS), 2015, : 1 - 7
[34] Circuit Modeling for RRAM-Based Neuromorphic Chip Crossbar Array With and Without Write-Verify Scheme
Tao, Tuomin
Ma, Hanzhi
Chen, Quankun
Gu, Zhe-Ming
Jin, Hang
Ahmed, Manareldeen
Tan, Shurun
Wang, Aili
Liu, En-Xiao
Li, Er-Ping
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (05) : 1906 - 1916
[35] A highly write-optimized concurrent B plus -tree for persistent memory
Yan, Wei
Zhang, Xingjun
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 155 : 219 - 230
[36] Write-Optimized B+ Tree Index Technology for Persistent Memory
Ma, Rui-Xiang
Wu, Fei
Dong, Bu-Rong
Zhang, Meng
Li, Wei-Jun
Xie, Chang-Sheng
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (05) : 1037 - 1050
[37] Write-Optimized B+ Tree Index Technology for Persistent Memory
Rui-Xiang Ma
Fei Wu
Bu-Rong Dong
Meng Zhang
Wei-Jun Li
Chang-Sheng Xie
Journal of Computer Science and Technology, 2021, 36 : 1037 - 1050
[38] A Novel Peripheral Circuit for RRAM-based LUT
Chen, Yi-Chung
Li, Hai
Zhang, Wei
2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 1811 - 1814
[39] A Novel RRAM-Based TCAM Search Array
Wang, Zhen
Li, Pengtao
Wang, Zijian
Xing, Shengpeng
Fan, Xuemeng
Zhang, Yishu
CONFERENCE OF SCIENCE & TECHNOLOGY FOR INTEGRATED CIRCUITS, 2024 CSTIC, 2024,
[40] CompRRAE: RRAM-based Convolutional Neural Network Accelerator with Reduced Computations through a Runtime Activation Estimation
Chen, Xizi
Zhu, Jingyang
Jiang, Jingbo
Tsui, Chi-Ying
24TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2019), 2019, : 133 - 139

← 1 2 3 4 5 →