Watt: A Write-Optimized RRAM-Based Accelerator for Attention

被引:0
|
作者
Zhang, Xuan [1 ]
Song, Zhuoran [1 ]
Li, Xing [1 ]
He, Zhezhi [1 ]
Jing, Naifeng [1 ]
Jiang, Li [1 ]
Liang, Xiaoyao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Resistive random access memory; accelerator; attention; importance; similarity; workload-aware dynamic scheduler;
D O I
10.1007/978-3-031-69766-1_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention-based models, such as Transformer and BERT, have achieved remarkable success across various tasks. However, their deployment is hindered by challenges such as high memory requirements, long inference latency, and significant power consumption. One potential solution for accelerating attention is the utilization of resistive random access memory (RRAM), which exploits process-in-memory (PIM) capability. However, existing RRAM-based accelerators often grapple with costly write operations. Accordingly, we exploit a write-optimized RRAM-based accelerator dubbed Watt for attention-based models, which can further reduce the number of intermediate data written to the RRAM-based crossbars, thereby effectively mitigating workload imbalance among crossbars. Specifically, given the importance and similarity of tokens in the sequences, we design the importance detector and similarity detector to significantly compress the intermediate data K-T and V written to the crossbars. Moreover, due to pruning numerous vectors in K-T and V, the number of vectors written to crossbars varies across different inferences, leading to workload imbalance among crossbars. To tackle this issue, we propose a workload-aware dynamic scheduler comprising a top-k engine and remapping engine. The scheduler first ranks the total write counts of each crossbar and the write counts for each inference using the top-k engine, then assigns inference tasks to the crossbars via the remapping engine. Experimental results show that Watt averagely achieves 6.5x, 4.0x, and 2.1x speedup compared to the state of the art accelerators Sanger, TransPIM, and Retransformer. Meanwhile, it averagely achieves 18.2x, 3.2x, and 2.8x energy savings with respect to the three accelerators.
引用
收藏
页码:107 / 120
页数:14
相关论文
共 50 条
  • [31] PMEH: A Parallel and Write-Optimized Extendible Hashing for Persistent Memory
    Hu, Jing
    Chen, Jianxi
    Zhu, Yifeng
    Yang, Qing
    Peng, Zhouxuan
    Yu, Ya
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (11) : 3801 - 3814
  • [32] Endurance Enhancement of Write-Optimized STT-RAM Caches
    Saraf, Puneet
    Mutyam, Madhu
    MEMSYS 2019: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2019, : 101 - 113
  • [33] Tracking Network Events with Write Optimized Data Structures The Design and Implementation of TWIAD: The Write-Optimized IP Address Database
    Donoghue, Nolan P.
    Hahn, Bridger
    Xu, Helen
    Kroeger, Thomas M.
    Zage, David
    Johnson, Rob
    2015 4TH INTERNATIONAL WORKSHOP ON BUILDING ANALYSIS DATASETS AND GATHERING EXPERIENCE RETURNS FOR SECURITY (BADGERS), 2015, : 1 - 7
  • [34] Circuit Modeling for RRAM-Based Neuromorphic Chip Crossbar Array With and Without Write-Verify Scheme
    Tao, Tuomin
    Ma, Hanzhi
    Chen, Quankun
    Gu, Zhe-Ming
    Jin, Hang
    Ahmed, Manareldeen
    Tan, Shurun
    Wang, Aili
    Liu, En-Xiao
    Li, Er-Ping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2021, 68 (05) : 1906 - 1916
  • [35] A highly write-optimized concurrent B plus -tree for persistent memory
    Yan, Wei
    Zhang, Xingjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2024, 155 : 219 - 230
  • [36] Write-Optimized B+ Tree Index Technology for Persistent Memory
    Ma, Rui-Xiang
    Wu, Fei
    Dong, Bu-Rong
    Zhang, Meng
    Li, Wei-Jun
    Xie, Chang-Sheng
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2021, 36 (05) : 1037 - 1050
  • [37] Write-Optimized B+ Tree Index Technology for Persistent Memory
    Rui-Xiang Ma
    Fei Wu
    Bu-Rong Dong
    Meng Zhang
    Wei-Jun Li
    Chang-Sheng Xie
    Journal of Computer Science and Technology, 2021, 36 : 1037 - 1050
  • [38] A Novel Peripheral Circuit for RRAM-based LUT
    Chen, Yi-Chung
    Li, Hai
    Zhang, Wei
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 1811 - 1814
  • [39] A Novel RRAM-Based TCAM Search Array
    Wang, Zhen
    Li, Pengtao
    Wang, Zijian
    Xing, Shengpeng
    Fan, Xuemeng
    Zhang, Yishu
    CONFERENCE OF SCIENCE & TECHNOLOGY FOR INTEGRATED CIRCUITS, 2024 CSTIC, 2024,
  • [40] CompRRAE: RRAM-based Convolutional Neural Network Accelerator with Reduced Computations through a Runtime Activation Estimation
    Chen, Xizi
    Zhu, Jingyang
    Jiang, Jingbo
    Tsui, Chi-Ying
    24TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2019), 2019, : 133 - 139