SADIMM: Accelerating <underline>S</underline>parse <underline>A</underline>ttention Using <underline>DIMM</underline>-Based Near-Memory Processing

被引:0
|
作者
Li, Huize [1 ]
Chen, Dan [1 ]
Mitra, Tulika [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 19077, Singapore
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Hardware; Memory management; Software; Parallel processing; Logic; Bandwidth; Transformers; Faces; DRAM chips; Near-memory processing; sparse attention accelerator; DRAM architecture; software-hardware co-design;
D O I
10.1109/TC.2024.3500362
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Self-attention mechanism is the performance bottleneck of Transformer based language models. In response, researchers have proposed sparse attention to expedite Transformer execution. However, sparse attention involves massive random access, rendering it as a memory-intensive kernel. Memory-based architectures, such as near-memory processing (NMP), demonstrate notable performance enhancements in memory-intensive applications. Nonetheless, existing NMP-based sparse attention accelerators face suboptimal performance due to hardware and software challenges. On the hardware front, current solutions employ homogeneous logic integration, struggling to support the diverse operations in sparse attention. On the software side, token-based dataflow is commonly adopted, leading to load imbalance after the pruning of weakly connected tokens. To address these challenges, this paper introduces SADIMM, a hardware-software co-designed NMP-based sparse attention accelerator. In hardware, we propose a heterogeneous integration approach to efficiently support various operations within the attention mechanism. This involves employing different logic units for different operations, thereby improving hardware efficiency. In software, we implement a dimension-based dataflow, dividing input sequences by model dimensions. This approach achieves load balancing after the pruning of weakly connected tokens. Compared to NVIDIA RTX A6000 GPU, the experimental results on BERT, BART, and GPT-2 models demonstrate that SADIMM achieves 48x, 35x, 37x speedups and 194x, 202x, 191x energy efficiency improvement, respectively.
引用
收藏
页码:542 / 554
页数:13
相关论文
共 50 条
  • [21] <underline>Map</underline>ping <underline>s</underline>edentary <underline>b</underline>ehaviour (MAPS-B) in winter and spring using wearable sensors, indoor positioning systems, and diaries in older adults who are pre-frail and frail: A feasibility longitudinal study
    Rodrigues, Isabel B.
    Tariq, Suleman
    Kouroukis, Alexa
    Swance, Rachel
    Adachi, Jonathan
    Bray, Steven
    Fang, Qiyin
    Ioannidis, George
    Kobsar, Dylan
    Rabinovich, Alexander
    Papaioannou, Alexandra
    Zheng, Rong
    PLOS ONE, 2024, 19 (05):
  • [22] Analysis and Optimization of a 6-DoF 3-<underline>R</underline>R<underline>P</underline>S Parallel Mechanism for Robot-Assisted Long-Bone Fracture Surgery
    Clancy, Michael
    Alruwaili, Fayez
    Saeedi-Hosseiny, Marzieh S.
    McMillan, Sean
    Iordachita, Iulian I.
    Abedin-Nasab, Mohammad H.
    JOURNAL OF MECHANISMS AND ROBOTICS-TRANSACTIONS OF THE ASME, 2024, 16 (06):
  • [23] <underline>Intellectuals, education and school: a study of Antonio Gramsci's notebook 12</underline>
    de Oliveira, Luana Aparecida
    BASILIADE-REVISTA DE FILOSOFIA, 2022, 4 (08):
  • [24] Comparison of intravascular lithotripsy and rotational atherectomy for the treatment of heavily calcified coronary lesions: the STIFF (<underline>S</underline>tenoses with calcificaTIon treated with angioplasty e<underline>FF</underline>ected with dedicated interventional tools) study
    Garzon, Stefano
    Bezerra, Felipe
    Mariani, Jose
    Bandeira, Willterson
    Prado, Guy
    Rueda, Victor
    Almeida, Breno
    Lemos, Pedro
    CORONARY ARTERY DISEASE, 2024, 35 (06) : 445 - 450
  • [25] Bimetallic MOF<underline>-</underline>Based Hybrid Platform with Dual Stimuli-Responsiveness for Sustained Release and Enhanced Retention
    Hegde, Vinayak
    Bhat, Mahesh P.
    Lee, Jae-Ho
    Kim, Cheol Soo
    Lee, Kyeong-Hwan
    ACS APPLIED MATERIALS & INTERFACES, 2025, 17 (13) : 20209 - 20224
  • [26] Kinematic Calibration for the 3-U<underline>P</underline>S/S Shipborne Stabilized Platform Based on Transfer Learning
    Xu, Min
    Tian, Wenjie
    Zhang, Xiangpeng
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (02)
  • [27] TriRod: A 3-<underline>R</underline>F Continuum Parallel Robot for Shape-Based Load Estimation
    Diezinger, Matyas
    Tamadazte, Brahim
    Laurent, Guillaume J.
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 7265 - 7272
  • [28] Obama's picks underline climate focus
    Hand, Eric
    Witze, Alexandra
    NATURE, 2009, 457 (7225) : 10 - 11
  • [29] Obama's picks underline climate focus
    Eric Hand
    Alexandra Witze
    Nature, 2009, 457 : 10 - 11
  • [30] Performance and Biomass Characteristics of SB<underline>R</underline>s Treating High-Salinity Wastewater at Presence of Anionic Surfactants
    Li, Huiru
    Wu, Shaohua
    Yang, Chunping
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (08)