SADIMM: Accelerating <underline>S</underline>parse <underline>A</underline>ttention Using <underline>DIMM</underline>-Based Near-Memory Processing

被引:0
|
作者
Li, Huize [1 ]
Chen, Dan [1 ]
Mitra, Tulika [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 19077, Singapore
基金
新加坡国家研究基金会;
关键词
Sparse matrices; Hardware; Memory management; Software; Parallel processing; Logic; Bandwidth; Transformers; Faces; DRAM chips; Near-memory processing; sparse attention accelerator; DRAM architecture; software-hardware co-design;
D O I
10.1109/TC.2024.3500362
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Self-attention mechanism is the performance bottleneck of Transformer based language models. In response, researchers have proposed sparse attention to expedite Transformer execution. However, sparse attention involves massive random access, rendering it as a memory-intensive kernel. Memory-based architectures, such as near-memory processing (NMP), demonstrate notable performance enhancements in memory-intensive applications. Nonetheless, existing NMP-based sparse attention accelerators face suboptimal performance due to hardware and software challenges. On the hardware front, current solutions employ homogeneous logic integration, struggling to support the diverse operations in sparse attention. On the software side, token-based dataflow is commonly adopted, leading to load imbalance after the pruning of weakly connected tokens. To address these challenges, this paper introduces SADIMM, a hardware-software co-designed NMP-based sparse attention accelerator. In hardware, we propose a heterogeneous integration approach to efficiently support various operations within the attention mechanism. This involves employing different logic units for different operations, thereby improving hardware efficiency. In software, we implement a dimension-based dataflow, dividing input sequences by model dimensions. This approach achieves load balancing after the pruning of weakly connected tokens. Compared to NVIDIA RTX A6000 GPU, the experimental results on BERT, BART, and GPT-2 models demonstrate that SADIMM achieves 48x, 35x, 37x speedups and 194x, 202x, 191x energy efficiency improvement, respectively.
引用
收藏
页码:542 / 554
页数:13
相关论文
共 50 条
  • [31] Underline detection and removal in a document image using multiple strategies
    Bai, ZL
    Huo, Q
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 578 - 581
  • [32] Positive 52-week maintenance data observed with rademikibart in patients with moderate-to-severe atopic dermatitis (S<underline>EASI</underline>DE CHINA)
    Zhang, Jianzhong
    Silverberg, Jonathan, I
    Guo, Jiawang
    Yun, Jili
    Pan, Wuban
    Wei, Zheng
    Collazo, Raul
    BRITISH JOURNAL OF DERMATOLOGY, 2024, 191
  • [33] Repurposing Oseltamivir Against C<bold> <underline>A</underline> </bold>G Repeat Mediated Toxicity in Huntington's Disease and Spinocerebellar Ataxia Using Cellular and Drosophila Model
    Singh, Krishna
    Gupta, Kanav
    Shukla, Sakshi
    Kumari, Aditi Pramod
    Kumar, Amit
    ACS OMEGA, 2025,
  • [34] Establishing a New Liver Transplant Unit in China-First Affiliated Hospital of <underline>the</underline> University of Science and Technology of China, Hefei, Anhui, People's Republic of China
    Qin, Jiwei
    Yuan, Xiaodong
    Zheng, Hao
    Qi, Can
    Guo, Yafei
    Zhu, Zebin
    Wu, Wei
    Xu, Zhijun
    Li, Xuefeng
    Wang, Ning
    Chai, Xiaoqing
    Xie, Yanhu
    Tao, Xiaogen
    Liu, Haihua
    Liu, Weiyong
    Liu, Guoyan
    Deng, Kexue
    Li, Yi
    Ji, Xuebing
    Hou, Changlong
    Yao, Ziqin
    Huang, Qiang
    Song, Ruipong
    Zhang, Shugeng
    Wang, Jizhou
    Wang, Haibo
    Liu, Lianxin
    Nashan, Bjoern
    TRANSPLANTATION, 2025, 109 (03) : 387 - 390
  • [35] Why the SAFE-<underline>S</underline> Strategy for Trachoma? Are Musca sorbens or Scatophaga stercoraria Really the Culprit?-A Brief Historical Review from an Italian Point of View
    Gallenga, Carla Enrica
    Maritati, Martina
    Del Boccio, Marco
    D'Aloisio, Rossella
    Conti, Pio
    Mura, Marco
    Contini, Carlo
    Gallenga, Pier Enrico
    PATHOGENS, 2023, 12 (12):
  • [36] Gabor Filter Based Hand-Drawn Underline Removal in Printed Documents
    Das, Supriya
    Banerjee, Purnendu
    2014 FIRST INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL, ENERGY & SYSTEMS (ACES-14), 2014, : 126 - 129
  • [37] 69th DGU Congress: Germany's Urologists underline Diversity
    不详
    AKTUELLE UROLOGIE, 2017, 48 (03) : 199 - 199
  • [38] Partial Discharge Measurement of EHV Underline Transmission XLPE Cables using FSA System Application
    Shin, D-H
    Lim, K-J
    Lwin, K-S
    Lee, Y-S
    Yang, J-S
    Park, N-J
    Park, D-H
    2007 IEEE INTERNATIONAL CONFERENCE ON SOLID DIELECTRICS, VOLS 1 AND 2, 2007, : 569 - +
  • [39] Efficacy of supervised <underline>i</underline>mmersive virtual reality-based training for the treatment of chronic fatigue in post-COVID syndrome: study protocol for a double-blind randomized controlled trial (IFATICO Trial)
    Tesarz, Jonas
    Lange, Hannah
    Kirchner, Marietta
    Goerlach, Axel
    Eich, Wolfgang
    Friederich, Hans-Christoph
    TRIALS, 2024, 25 (01)
  • [40] How many processes underline category-based induction? Effect of conclusion specificity and cognitive ability
    Feeney, Aidan
    MEMORY & COGNITION, 2007, 35 (07) : 1830 - 1839