Watt: A Write-Optimized RRAM-Based Accelerator for Attention

被引:0
|
作者
Zhang, Xuan [1 ]
Song, Zhuoran [1 ]
Li, Xing [1 ]
He, Zhezhi [1 ]
Jing, Naifeng [1 ]
Jiang, Li [1 ]
Liang, Xiaoyao [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Resistive random access memory; accelerator; attention; importance; similarity; workload-aware dynamic scheduler;
D O I
10.1007/978-3-031-69766-1_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Attention-based models, such as Transformer and BERT, have achieved remarkable success across various tasks. However, their deployment is hindered by challenges such as high memory requirements, long inference latency, and significant power consumption. One potential solution for accelerating attention is the utilization of resistive random access memory (RRAM), which exploits process-in-memory (PIM) capability. However, existing RRAM-based accelerators often grapple with costly write operations. Accordingly, we exploit a write-optimized RRAM-based accelerator dubbed Watt for attention-based models, which can further reduce the number of intermediate data written to the RRAM-based crossbars, thereby effectively mitigating workload imbalance among crossbars. Specifically, given the importance and similarity of tokens in the sequences, we design the importance detector and similarity detector to significantly compress the intermediate data K-T and V written to the crossbars. Moreover, due to pruning numerous vectors in K-T and V, the number of vectors written to crossbars varies across different inferences, leading to workload imbalance among crossbars. To tackle this issue, we propose a workload-aware dynamic scheduler comprising a top-k engine and remapping engine. The scheduler first ranks the total write counts of each crossbar and the write counts for each inference using the top-k engine, then assigns inference tasks to the crossbars via the remapping engine. Experimental results show that Watt averagely achieves 6.5x, 4.0x, and 2.1x speedup compared to the state of the art accelerators Sanger, TransPIM, and Retransformer. Meanwhile, it averagely achieves 18.2x, 3.2x, and 2.8x energy savings with respect to the three accelerators.
引用
收藏
页码:107 / 120
页数:14
相关论文
共 50 条
  • [21] CPR: Crossbar-grain Pruning for an RRAM-based Accelerator with Coordinate-based Weight Mapping
    Park, Jihye
    Kang, Seokhyeong
    2022 IEEE 40TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2022), 2022, : 336 - 343
  • [22] SubMac: Exploiting the subword-based computation in RRAM-based CNN accelerator for energy saving and speedup
    Chen, Xizi
    Jiang, Jingbo
    Zhu, Jingyang
    Tsui, Chi-Ying
    INTEGRATION-THE VLSI JOURNAL, 2019, 69 : 356 - 368
  • [23] An RRAM-Based Oscillatory Neural Network
    Jackson, Thomas C.
    Sharma, Abhishek A.
    Bain, James A.
    Weldon, Jeffrey A.
    Pileggi, Lawrence
    2015 IEEE 6TH LATIN AMERICAN SYMPOSIUM ON CIRCUITS & SYSTEMS (LASCAS), 2015,
  • [24] On the Reliability of RRAM-Based Neural Networks
    Aziza, Hassen
    Zambelli, Cristian
    Hamdioui, Said
    Diware, Sumit
    Bishnoi, Rajendra
    Gebregiorgis, Anteneh
    2023 IFIP/IEEE 31ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION, VLSI-SOC, 2023, : 13 - 20
  • [25] An RRAM-based MLC design approach
    Bagheri-Soulla, A. A.
    Ghaznavi-Ghoushchi, M. B.
    MICROELECTRONICS JOURNAL, 2017, 64 : 9 - 18
  • [26] RRAM-Based TCAMs for Pattern Search
    Zheng, Le
    Shin, Sangho
    Lloyd, Scott
    Gokhale, Maya
    Kim, Kyungmin
    Kang, Sung-Mo
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 1382 - 1385
  • [27] RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator
    Park, Yongmo
    Wang, Ziyu
    Yoo, Sangmin
    Lu, Wei D. D.
    IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS, 2022, 8 (02): : 93 - 101
  • [28] Write-Optimized and Consistent Skiplists for Non-Volatile Memory
    Xiao, Renzhi
    Feng, Dan
    Hu, Yuchong
    Wang, Fang
    Wei, Xueliang
    Zou, Xiaomin
    Lei, Mengya
    IEEE ACCESS, 2021, 9 : 69850 - 69859
  • [29] FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores
    He, Kewen
    An, Yujie
    Luo, Yijing
    Liu, Xiaoguang
    Wang, Gang
    ACM TRANSACTIONS ON STORAGE, 2023, 19 (02)
  • [30] RRAM-Based Analog Approximate Computing
    Li, Boxun
    Gu, Peng
    Shan, Yi
    Wang, Yu
    Chen, Yiran
    Yang, Huazhong
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (12) : 1905 - 1917