SADIMM: Accelerating <underline>S</underline>parse <underline>A</underline>ttention Using <underline>DIMM</underline>-Based Near-Memory Processing

被引:0
|
作者
Li, Huize [1 ]
Chen, Dan [1 ]
Mitra, Tulika [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 19077, Singapore
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Hardware; Memory management; Software; Parallel processing; Logic; Bandwidth; Transformers; Faces; DRAM chips; Near-memory processing; sparse attention accelerator; DRAM architecture; software-hardware co-design;
D O I
10.1109/TC.2024.3500362
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Self-attention mechanism is the performance bottleneck of Transformer based language models. In response, researchers have proposed sparse attention to expedite Transformer execution. However, sparse attention involves massive random access, rendering it as a memory-intensive kernel. Memory-based architectures, such as near-memory processing (NMP), demonstrate notable performance enhancements in memory-intensive applications. Nonetheless, existing NMP-based sparse attention accelerators face suboptimal performance due to hardware and software challenges. On the hardware front, current solutions employ homogeneous logic integration, struggling to support the diverse operations in sparse attention. On the software side, token-based dataflow is commonly adopted, leading to load imbalance after the pruning of weakly connected tokens. To address these challenges, this paper introduces SADIMM, a hardware-software co-designed NMP-based sparse attention accelerator. In hardware, we propose a heterogeneous integration approach to efficiently support various operations within the attention mechanism. This involves employing different logic units for different operations, thereby improving hardware efficiency. In software, we implement a dimension-based dataflow, dividing input sequences by model dimensions. This approach achieves load balancing after the pruning of weakly connected tokens. Compared to NVIDIA RTX A6000 GPU, the experimental results on BERT, BART, and GPT-2 models demonstrate that SADIMM achieves 48x, 35x, 37x speedups and 194x, 202x, 191x energy efficiency improvement, respectively.
引用
收藏
页码:542 / 554
页数:13
相关论文
共 50 条
  • [1] SIMPNet: <underline>S</underline>patial-<underline>I</underline>nformed <underline>M</underline>otion <underline>P</underline>lanning <underline>Net</underline>work
    Soleymanzadeh, Davood
    Liang, Xiao
    Zheng, Minghui
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2870 - 2877
  • [2] dbAPIS: a database of <underline>a</underline>nti-<underline>p</underline>rokaryotic <underline>i</underline>mmune <underline>s</underline>ystem genes
    Yan, Yuchen
    Zheng, Jinfang
    Zhang, Xinpeng
    Yin, Yanbin
    NUCLEIC ACIDS RESEARCH, 2023, 52 (D1) : D419 - D425
  • [3] HGNAS: <underline>H</underline>ardware-Aware <underline>G</underline>raph <underline>N</underline>eural <underline>A</underline>rchitecture <underline>S</underline>earch for Edge Devices
    Zhou, Ao
    Yang, Jianlei
    Qi, Yingjie
    Qiao, Tong
    Shi, Yumeng
    Duan, Cenlin
    Zhao, Weisheng
    Hu, Chunming
    IEEE TRANSACTIONS ON COMPUTERS, 2024, 73 (12) : 2693 - 2707
  • [4] LITE-SNN: <underline>L</underline>everaging <underline>I</underline>nherent Dynamics to <underline>T</underline>rain <underline>E</underline>nergy-Efficient <underline>S</underline>piking <underline>N</underline>eural <underline>N</underline>etworks for Sequential Learning
    Rathi, Nitin
    Roy, Kaushik
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (06) : 1905 - 1914
  • [5] Esale: <underline>E</underline>nhancing Code-<underline>S</underline>ummary <underline>A</underline>lignment <underline>Le</underline>arning for Source Code Summarization
    Fang, Chunrong
    Sun, Weisong
    Chen, Yuchen
    Chen, Xiao
    Wei, Zhao
    Zhang, Quanjun
    You, Yudu
    Luo, Bin
    Liu, Yang
    Chen, Zhenyu
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2024, 50 (08) : 2077 - 2095
  • [6] The OASIS walking study-<underline>O</underline>lder <underline>a</underline>dults with cognitive impairment performing <underline>sit</underline> to <underline>s</underline>tands and <underline>walking</underline> in transitional care programs: Protocol for a feasibility <underline>study</underline>
    Cumal, Alexia
    Colella, Tracey J. F.
    Puts, Martine T.
    McGilton, Katherine S.
    PLOS ONE, 2024, 19 (09):
  • [7] Severity of Atelectasis during Bronchoscopy: Descriptions of a New Grading System (<underline>A</underline>telectasi<underline>s</underline> <underline>Se</underline>verity <underline>S</underline>coring <underline>S</underline>ystem-"ASSESS") and At-Risk-Lung Zones
    Khan, Asad
    Bashour, Sami
    Sabath, Bruce
    Lin, Julie
    Sarkiss, Mona
    Song, Juhee
    Sagar, Ala-Eddin S.
    Shah, Archan
    Casal, Roberto F.
    DIAGNOSTICS, 2024, 14 (02)
  • [8] <underline>P</underline>robiotics <underline>i</underline>nfluencing <underline>r</underline>esponse of <underline>a</underline>ntibodies over <underline>t</underline>ime in <underline>s</underline>eniors after <underline>CO</underline>VID-19 <underline>v</underline>accine (PIRATES-COV): a randomised controlled trial protocol
    Pasquier, Jean-Charles
    Plourde, Melanie
    Ramanathan, Sheela
    Chaillet, N.
    Boivin, Guy
    Laforest-Lapointe, Isabelle
    Allard-Chamard, Hugues
    Baron, Genevieve
    Beaulieu, Jean-Francois
    Fulop, Tamas
    Genereux, Melissa
    Masse, Benoit
    Robitaille, Julie
    Valiquette, Louis
    Bilodeau, Sarah
    Buch, Danielle H.
    Piche, Alain
    BMJ OPEN, 2025, 15 (03):
  • [9] HEARTS Study Protocol: <underline>H</underline>elping <underline>E</underline>nable <underline>A</underline>ccess and <underline>R</underline>emove Barriers <underline>T</underline>o <underline>S</underline>upport for Young Adults with Mental Health-Related Disabilities
    Rao, Sandy
    Dimitropoulos, Gina
    Milaney, Katrina
    Eurich, Dean T.
    Patten, Scott B.
    YOUTH, 2024, 4 (01): : 107 - 123
  • [10] FLASH-and-Prune: <underline>F</underline>ederated <underline>L</underline>earning for <underline>A</underline>utomated <underline>S</underline>election of <underline>H</underline>igh-Band mmWave Sectors using Model Pruning
    Salehi, Batool
    Roy, Debashri
    Gu, Jerry
    Dick, Chris
    Chowdhury, Kaushik
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11655 - 11669