SAMBA: Sparsity Aware In-Memory Computing Based Machine Learning Accelerator

被引:1
|
作者
Kim, Dong Eun [1 ]
Ankit, Aayush [2 ]
Wang, Cheng [3 ]
Roy, Kaushik [1 ]
机构
[1] Purdue Univ, Dept Elect & Comp Engn, W Lafayette, IN 47907 USA
[2] Microsoft, San Jose, CA 95112 USA
[3] Iowa State Univ, Dept Elect & Comp Engn, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
Computer architecture; Optimization; Load management; Convolution; In-memory computing; Hardware; Energy consumption; Accelerator; in-memory computing; neural networks; sparsity; COPROCESSOR;
D O I
10.1109/TC.2023.3257513
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Machine Learning (ML) inference is typically dominated by highly data-intensive Matrix Vector Multiplication (MVM) computations that may be constrained by memory bottleneck due to massive data movement between processor and memory. Although analog in-memory computing (IMC) ML accelerators have been proposed to execute MVM with high efficiency, the latency and energy of such computing systems can be dominated by the large latency and energy costs from analog-to-digital converters (ADCs). Leveraging sparsity in ML workloads, reconfigurable ADCs can save MVM energy and latency by reducing the required ADC bit precision. However, such improvement in latency can be hindered by non-uniform sparsity of the weight matrices mapped into hardware. Moreover, data movement between MVM processing cores may become another factor that delays the overall system-level performance. To address these issues, we propose SAMBA, Sparsity Aware IMC Based Machine Learning Accelerator. First, we propose load balancing during mapping of weight matrices into physical crossbars to eliminate non-uniformity in the sparsity of mapped matrices. Second, we propose optimizations in arranging and scheduling the tiled MVM hardware to minimize the overhead of data movement across multiple processing cores. Our evaluations show that the proposed load balancing technique can achieve performance improvement. The proposed optimizations can further improve both performance and energy-efficiency regardless of sparsity condition. With the combination of load balancing and data movement optimization in conjunction with reconfigurable ADCs, our proposed approach provides up to 2.38x speed-up and 1.54x energy-efficiency over stateof- art analog IMC based ML accelerators for ImageNet datasets on Resnet-50 architecture.
引用
收藏
页码:2615 / 2627
页数:13
相关论文
共 50 条
  • [21] Sparsity-Aware In-Memory Neuromorphic Computing Unit With Configurable Topology of Hybrid Spiking and Artificial Neural Network
    Liu, Ying
    Chen, Zhiyuan
    Zhao, Wentao
    Zhao, Tianhao
    Jia, Tianyu
    Wang, Zhixuan
    Huang, Ru
    Ye, Le
    Ma, Yufei
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (06) : 2660 - 2673
  • [22] Broad-Purpose In-Memory Computing for Signal Monitoring and Machine Learning Workloads
    Jain, Saurabh
    Lin, Longyang
    Alioto, Massimo
    [J]. IEEE SOLID-STATE CIRCUITS LETTERS, 2020, 3 : 394 - 397
  • [23] An in-memory computing architecture based on a duplex two-dimensional material structure for in situ machine learning
    Hongkai Ning
    Zhihao Yu
    Qingtian Zhang
    Hengdi Wen
    Bin Gao
    Yun Mao
    Yuankun Li
    Ying Zhou
    Yue Zhou
    Jiewei Chen
    Lei Liu
    Wenfeng Wang
    Taotao Li
    Yating Li
    Wanqing Meng
    Weisheng Li
    Yun Li
    Hao Qiu
    Yi Shi
    Yang Chai
    Huaqiang Wu
    Xinran Wang
    [J]. Nature Nanotechnology, 2023, 18 : 493 - 500
  • [24] An in-memory computing architecture based on a duplex two-dimensional material structure for in situ machine learning
    Ning, Hongkai
    Yu, Zhihao
    Zhang, Qingtian
    Wen, Hengdi
    Gao, Bin
    Mao, Yun
    Li, Yuankun
    Zhou, Ying
    Zhou, Yue
    Chen, Jiewei
    Liu, Lei
    Wang, Wenfeng
    Li, Taotao
    Li, Yating
    Meng, Wanqing
    Li, Weisheng
    Li, Yun
    Qiu, Hao
    Shi, Yi
    Chai, Yang
    Wu, Huaqiang
    Wang, Xinran
    [J]. NATURE NANOTECHNOLOGY, 2023, 18 (05) : 493 - +
  • [25] MRIMA: An MRAM-Based In-Memory Accelerator
    Angizi, Shaahin
    He, Zhezhi
    Awad, Amro
    Fan, Deliang
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2020, 39 (05) : 1123 - 1136
  • [26] In-Memory Computing Accelerators for Emerging Learning Paradigms
    Reis, Dayane
    Laguna, Ann Franchesca
    Niemier, Michael
    Hu, Xiaobo Sharon
    [J]. 2023 28TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC, 2023, : 606 - 611
  • [27] AI: From Deep Learning to In-Memory Computing
    Lung, Hsiang-Lan
    [J]. METROLOGY, INSPECTION, AND PROCESS CONTROL FOR MICROLITHOGRAPHY XXXIII, 2019, 10959
  • [28] Ferroelectric FET based In-Memory Computing for Few-Shot Learning
    Laguna, Ann Franchesca
    Yin, Xunzhao
    Reis, Dayane
    Niemier, Michael
    Hu, X. Sharon
    [J]. GLSVLSI '19 - PROCEEDINGS OF THE 2019 ON GREAT LAKES SYMPOSIUM ON VLSI, 2019, : 373 - 378
  • [29] Fully Binarized Graph Convolutional Network Accelerator Based on In-Memory Computing with Resistive Random-Access Memory
    Zhang, Woyu
    Li, Zhi
    Zhang, Xinyuan
    Wang, Fei
    Wang, Shaocong
    Lin, Ning
    Li, Yi
    Wang, Jun
    Yue, Jinshan
    Dou, Chunmeng
    Xu, Xiaoxin
    Wang, Zhongrui
    Shang, Dashan
    [J]. ADVANCED INTELLIGENT SYSTEMS, 2024, 6 (07)
  • [30] Ferroelectric capacitors and field-effect transistors as in-memory computing elements for machine learning workloads
    Yu, Eunseon
    Kumar, Gaurav K.
    Saxena, Utkarsh
    Roy, Kaushik
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):