Efficient memristor accelerator for transformer self-attention functionality

被引:0
|
作者
Bettayeb, Meriem [1 ,2 ]
Halawani, Yasmin [3 ]
Khan, Muhammad Umair [1 ]
Saleh, Hani [1 ]
Mohammad, Baker [1 ]
机构
[1] Khalifa Univ, Syst Onchip Lab, Comp & Informat Engn, Abu Dhabi, U Arab Emirates
[2] Abu Dhabi Univ, Coll Engn, Comp Sci & Informat Technol Dept, Abu Dhabi, U Arab Emirates
[3] Univ Dubai, Coll Engn & IT, Dubai, U Arab Emirates
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
ARCHITECTURE;
D O I
10.1038/s41598-024-75021-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations constrain the capabilities and speed of convolutional neural networks (CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memory bottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation and the highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in a reduction of approximately 10 times in the number of multiply-accumulate (MAC) operations in transformer networks, while maintaining 95.47% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim 3.0. Simulation outcomes indicate an area utilization of 6895.7 mu m2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu m<^>2$$\end{document}, a latency of 15.52 seconds, an energy consumption of 3 mJ, and a leakage power of 59.55 mu W\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu W$$\end{document}. The methodology outlined in this paper represents a substantial stride towards a hardware-friendly transformer architecture for edge devices, poised to achieve real-time performance.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Transformer with sparse self-attention mechanism for image captioning
    Wang, Duofeng
    Hu, Haifeng
    Chen, Dihu
    ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
  • [22] Transformer Self-Attention Network for Forecasting Mortality Rates
    Roshani, Amin
    Izadi, Muhyiddin
    Khaledi, Baha-Eldin
    JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2022, 21 (01): : 81 - 103
  • [23] Keyword Transformer: A Self-Attention Model for Keyword Spotting
    Berg, Axel
    O'Connor, Mark
    Cruz, Miguel Tairum
    INTERSPEECH 2021, 2021, : 4249 - 4253
  • [24] An Efficient Transformer Based on Global and Local Self-Attention for Face Photo-Sketch Synthesis
    Yu, Wangbo
    Zhu, Mingrui
    Wang, Nannan
    Wang, Xiaoyu
    Gao, Xinbo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 483 - 495
  • [25] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
    Leem, Saebom
    Seo, Hyunseok
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
  • [26] Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
    Hao, Yaru
    Dong, Li
    Wei, Furu
    Xu, Ke
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12963 - 12971
  • [27] RSAFormer: A method of polyp segmentation with region self-attention transformer
    Yin X.
    Zeng J.
    Hou T.
    Tang C.
    Gan C.
    Jain D.K.
    García S.
    Computers in Biology and Medicine, 2024, 172
  • [28] Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
    Wu, Yifan
    Kan, Shichao
    Zeng, Min
    Li, Min
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4433 - 4441
  • [29] Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions
    He, Shujun
    Gao, Baizhen
    Sabnis, Rushant
    Sun, Qing
    ACS SYNTHETIC BIOLOGY, 2023, 12 (11): : 3205 - 3214
  • [30] ET: Re -Thinking Self-Attention for Transformer Models on GPUs
    Chen, Shiyang
    Huang, Shaoyi
    Pandey, Santosh
    Li, Bingbing
    Gao, Guang R.
    Zheng, Long
    Ding, Caiwen
    Liu, Hang
    SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,