RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge

被引:2
|
作者
Krishna, Adithya [1 ,2 ]
Rohit Nudurupati, Srikanth [3 ]
Chandana, D. G. [3 ]
Dwivedi, Pritesh [3 ]
van Schaik, Andre [2 ]
Mehendale, Mahesh [3 ]
Thakur, Chetan Singh [3 ]
机构
[1] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India
[2] Western Sydney Univ, MARCS Inst, Int Ctr Neuromorph Syst, Penrith, NSW 2751, Australia
[3] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 14期
关键词
Standards; Field programmable gate arrays; Random access memory; Hardware; Costs; Convolution; System-on-chip; Convolutional neural networks (CNNs); deep learning; hardware acceleration; sparse processing; DEEP NEURAL-NETWORKS; FLEXIBLE ACCELERATOR; EFFICIENT; PROCESSOR; CNN;
D O I
10.1109/JIOT.2024.3386832
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural network (DNN)-based inference at the edge is challenging as these compute, and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this article, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power, as well as latency. RAMAN can be configured to support a wide range of DNN topologies-consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy versus power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results, and compare the same with the state of the art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.
引用
收藏
页码:24831 / 24845
页数:15
相关论文
共 50 条
  • [1] FPNA: A Reconfigurable Accelerator for AI Inference at the Edge
    Gadfort, Peter
    Ayorinde, Oluseyi A.
    34TH IEEE INTERNATIONAL SYSTEM ON CHIP CONFERENCE (SOCC), 2021, : 242 - 247
  • [2] Live Demonstration: Real-time audio and visual inference on the RAMAN TinyML accelerator
    Krishna, Adithya
    Rajesh, Ashwin
    Oleti, Hitesh Pavan
    Chauhan, Anand
    Shankaranarayanan, H.
    van Schaik, Andre
    Mehendale, Mahesh
    Thakur, Chetan Singh
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [3] Energy-Efficient Inference on the Edge Exploiting TinyML Capabilities for UAVs
    Raza, Wamiq
    Osman, Anas
    Ferrini, Francesco
    Natale, Francesco De
    DRONES, 2021, 5 (04)
  • [4] ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
    Asgari, Bahar
    Hadidi, Ramyad
    Krishna, Tushar
    Kim, Hyesoon
    Yalamanchili, Sudhakar
    2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 249 - 260
  • [5] Reconfigurable Intelligent Surface for Green Edge Inference
    Hua, Sheng
    Zhou, Yong
    Yang, Kai
    Shi, Yuanming
    Wang, Kunlun
    IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2021, 5 (02): : 964 - 979
  • [6] SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator
    Pal, Subhankar
    Amarnath, Aporva
    Feng, Siying
    O'Boyle, Michael
    Dreslinski, Ronald
    Dubach, Christophe
    PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1005 - 1021
  • [7] Sparse optimization for green edge ai inference
    Yang, Xiang Yu
    Hua, Sheng
    Shi, Yuan Ming
    Wang, Hao
    Zhang, Jun
    Letaief, Khaled B.
    Journal of Communications and Information Networks, 2020, 5 (01) : 1 - 15
  • [8] EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference
    Gao, Chang
    Rios-Navarro, Antonio
    Chen, Xi
    Liu, Shih-Chii
    Delbruck, Tobi
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2020, 10 (04) : 419 - 432
  • [9] ViTA: A Vision Transformer Inference Accelerator for Edge Applications
    Nag, Shashank
    Datta, Gourav
    Kundu, Souvik
    Chandrachoodan, Nitin
    Beerel, Peter A.
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [10] Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial
    Mao, Wendong
    Wang, Meiqi
    Xie, Xiaoru
    Wu, Xiao
    Wang, Zhongfeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1708 - 1714