Efficient memristor accelerator for transformer self-attention functionality

被引：0

作者：

Bettayeb, Meriem ^{[1
,2
]}

Halawani, Yasmin ^{[3
]}

Khan, Muhammad Umair ^{[1
]}

Saleh, Hani ^{[1
]}

Mohammad, Baker ^{[1
]}

机构：

[1] Khalifa Univ, Syst Onchip Lab, Comp & Informat Engn, Abu Dhabi, U Arab Emirates

[2] Abu Dhabi Univ, Coll Engn, Comp Sci & Informat Technol Dept, Abu Dhabi, U Arab Emirates

[3] Univ Dubai, Coll Engn & IT, Dubai, U Arab Emirates

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

ARCHITECTURE;

D O I：

10.1038/s41598-024-75021-z

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The adoption of transformer networks has experienced a notable surge in various AI applications. However, the increased computational complexity, stemming primarily from the self-attention mechanism, parallels the manner in which convolution operations constrain the capabilities and speed of convolutional neural networks (CNNs). The self-attention algorithm, specifically the matrix-matrix multiplication (MatMul) operations, demands a substantial amount of memory and computational complexity, thereby restricting the overall performance of the transformer. This paper introduces an efficient hardware accelerator for the transformer network, leveraging memristor-based in-memory computing. The design targets the memory bottleneck associated with MatMul operations in the self-attention process, utilizing approximate analog computation and the highly parallel computations facilitated by the memristor crossbar architecture. Remarkably, this approach resulted in a reduction of approximately 10 times in the number of multiply-accumulate (MAC) operations in transformer networks, while maintaining 95.47% accuracy for the MNIST dataset, as validated by a comprehensive circuit simulator employing NeuroSim 3.0. Simulation outcomes indicate an area utilization of 6895.7 mu m2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu m<^>2$$\end{document}, a latency of 15.52 seconds, an energy consumption of 3 mJ, and a leakage power of 59.55 mu W\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu W$$\end{document}. The methodology outlined in this paper represents a substantial stride towards a hardware-friendly transformer architecture for edge devices, poised to achieve real-time performance.

引用

页数：15

共 50 条

[21] Transformer with sparse self-attention mechanism for image captioning
Wang, Duofeng
Hu, Haifeng
Chen, Dihu
ELECTRONICS LETTERS, 2020, 56 (15) : 764 - +
[22] Transformer Self-Attention Network for Forecasting Mortality Rates
Roshani, Amin
Izadi, Muhyiddin
Khaledi, Baha-Eldin
JIRSS-JOURNAL OF THE IRANIAN STATISTICAL SOCIETY, 2022, 21 (01): : 81 - 103
[23] Keyword Transformer: A Self-Attention Model for Keyword Spotting
Berg, Axel
O'Connor, Mark
Cruz, Miguel Tairum
INTERSPEECH 2021, 2021, : 4249 - 4253
[24] An Efficient Transformer Based on Global and Local Self-Attention for Face Photo-Sketch Synthesis
Yu, Wangbo
Zhu, Mingrui
Wang, Nannan
Wang, Xiaoyu
Gao, Xinbo
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 483 - 495
[25] Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention
Leem, Saebom
Seo, Hyunseok
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 2956 - 2964
[26] Self-Attention Attribution: Interpreting Information Interactions Inside Transformer
Hao, Yaru
Dong, Li
Wei, Furu
Xu, Ke
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12963 - 12971
[27] RSAFormer: A method of polyp segmentation with region self-attention transformer
Yin X.
Zeng J.
Hou T.
Tang C.
Gan C.
Jain D.K.
García S.
Computers in Biology and Medicine, 2024, 172
[28] Singularformer: Learning to Decompose Self-Attention to Linearize the Complexity of Transformer
Wu, Yifan
Kan, Shichao
Zeng, Min
Li, Min
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 4433 - 4441
[29] Nucleic Transformer: Classifying DNA Sequences with Self-Attention and Convolutions
He, Shujun
Gao, Baizhen
Sabnis, Rushant
Sun, Qing
ACS SYNTHETIC BIOLOGY, 2023, 12 (11): : 3205 - 3214
[30] ET: Re -Thinking Self-Attention for Transformer Models on GPUs
Chen, Shiyang
Huang, Shaoyi
Pandey, Santosh
Li, Bingbing
Gao, Guang R.
Zheng, Long
Ding, Caiwen
Liu, Hang
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,

← 1 2 3 4 5 →