SAMBA: Sparsity Aware In-Memory Computing Based Machine Learning Accelerator

被引：1

作者：

Kim, Dong Eun ^{[1
]}

Ankit, Aayush ^{[2
]}

Wang, Cheng ^{[3
]}

Roy, Kaushik ^{[1
]}

机构：

[1] Purdue Univ, Dept Elect & Comp Engn, W Lafayette, IN 47907 USA

[2] Microsoft, San Jose, CA 95112 USA

[3] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2023年 / 72卷 / 09期

基金：

美国国家科学基金会;

关键词：

Computer architecture; Optimization; Load management; Convolution; In-memory computing; Hardware; Energy consumption; Accelerator; in-memory computing; neural networks; sparsity; COPROCESSOR;

D O I：

10.1109/TC.2023.3257513

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine Learning (ML) inference is typically dominated by highly data-intensive Matrix Vector Multiplication (MVM) computations that may be constrained by memory bottleneck due to massive data movement between processor and memory. Although analog in-memory computing (IMC) ML accelerators have been proposed to execute MVM with high efficiency, the latency and energy of such computing systems can be dominated by the large latency and energy costs from analog-to-digital converters (ADCs). Leveraging sparsity in ML workloads, reconfigurable ADCs can save MVM energy and latency by reducing the required ADC bit precision. However, such improvement in latency can be hindered by non-uniform sparsity of the weight matrices mapped into hardware. Moreover, data movement between MVM processing cores may become another factor that delays the overall system-level performance. To address these issues, we propose SAMBA, Sparsity Aware IMC Based Machine Learning Accelerator. First, we propose load balancing during mapping of weight matrices into physical crossbars to eliminate non-uniformity in the sparsity of mapped matrices. Second, we propose optimizations in arranging and scheduling the tiled MVM hardware to minimize the overhead of data movement across multiple processing cores. Our evaluations show that the proposed load balancing technique can achieve performance improvement. The proposed optimizations can further improve both performance and energy-efficiency regardless of sparsity condition. With the combination of load balancing and data movement optimization in conjunction with reconfigurable ADCs, our proposed approach provides up to 2.38x speed-up and 1.54x energy-efficiency over stateof- art analog IMC based ML accelerators for ImageNet datasets on Resnet-50 architecture.

引用

页码：2615 / 2627

页数：13

共 50 条

[1] In-Memory Computing for Machine Learning and Deep Learning
Lepri, N.
Glukhov, A.
Cattaneo, L.
Farronato, M.
Mannocci, P.
Ielmini, D.
[J]. IEEE JOURNAL OF THE ELECTRON DEVICES SOCIETY, 2023, 11 : 587 - 601
[2] In-Memory Computing based Machine Learning Accelerators: Opportunities and Challenges
Roy, Kaushik
[J]. PROCEEDINGS OF THE 32ND GREAT LAKES SYMPOSIUM ON VLSI 2022, GLSVLSI 2022, 2022, : 203 - 204
[3] MULTIFUNCTIONAL RRAM CHIP WITH CONFIGURABILITY FOR SPARSITY-AWARE IN-MEMORY ISNG MACHINE
Yue, Wenshuo
Jing, Zhaokun
Yan, Bonan
Tao, Yaoyu
Zhang, Teng
Huang, Ru
Yang, Yuchao
[J]. CONFERENCE OF SCIENCE & TECHNOLOGY FOR INTEGRATED CIRCUITS, 2024 CSTIC, 2024,
[4] In-Memory Computing in Emerging Memory Technologies for Machine Learning: An Overview
Roy, Kaushik
Chakraborty, Indranil
Ali, Mustafa
Ankit, Aayush
Agrawal, Amogh
[J]. PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
[5] Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators
Ankit, Aayush
Chakraborty, Indranil
Agrawal, Amogh
Ali, Mustafa
Roy, Kaushik
[J]. IEEE MICRO, 2020, 40 (06) : 8 - 21
[6] A review of in-memory computing for machine learning: architectures, options
Snasel, Vaclav
Dang, Tran Khanh
Kueng, Josef
Kong, Lingping
[J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2024, 20 (01) : 24 - 47
[7] In-Memory Computing based Accelerator for Transformer Networks for Long Sequences
Laguna, Ann Franchesca
Kazemi, Arman
Niemier, Michael
Hu, X. Sharon
[J]. PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021), 2021, : 1839 - 1844
[8] CMOS Annealing Machine: an In-memory Computing Accelerator to Process Combinatorial Optimization Problems
Yamaoka, Masanao
Okuyama, Takuya
Hayashi, Masato
Yoshimura, Chihiro
Takemoto, Takashi
[J]. 2019 IEEE CUSTOM INTEGRATED CIRCUITS CONFERENCE (CICC), 2019,
[9] Deep learning acceleration based on in-memory computing
Eleftheriou, E.
Le Gallo, M.
Nandakumar, S. R.
Piveteau, C.
Boybat, I
Joshi, V
Khaddam-Aljameh, R.
Dazzi, M.
Giannopoulos, I
Karunaratne, G.
Kersting, B.
Stanisavljevic, M.
Jonnalagadda, V. P.
Ioannou, N.
Kourtis, K.
Francese, P. A.
Sebastian, A.
[J]. IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (06)
[10] In-Memory Computing Architectures for Big Data and Machine Learning Applications
Snasel, Vaclav
Tran Khanh Dang
Pham, Phuong N. H.
Kueng, Josef
Kong, Lingping
[J]. FUTURE DATA AND SECURITY ENGINEERING. BIG DATA, SECURITY AND PRIVACY, SMART CITY AND INDUSTRY 4.0 APPLICATIONS, FDSE 2022, 2022, 1688 : 19 - 33

← 1 2 3 4 5 →