RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge

被引:2
|
作者
Krishna, Adithya [1 ,2 ]
Rohit Nudurupati, Srikanth [3 ]
Chandana, D. G. [3 ]
Dwivedi, Pritesh [3 ]
van Schaik, Andre [2 ]
Mehendale, Mahesh [3 ]
Thakur, Chetan Singh [3 ]
机构
[1] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India
[2] Western Sydney Univ, MARCS Inst, Int Ctr Neuromorph Syst, Penrith, NSW 2751, Australia
[3] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India
来源
IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 14期
关键词
Standards; Field programmable gate arrays; Random access memory; Hardware; Costs; Convolution; System-on-chip; Convolutional neural networks (CNNs); deep learning; hardware acceleration; sparse processing; DEEP NEURAL-NETWORKS; FLEXIBLE ACCELERATOR; EFFICIENT; PROCESSOR; CNN;
D O I
10.1109/JIOT.2024.3386832
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural network (DNN)-based inference at the edge is challenging as these compute, and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this article, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power, as well as latency. RAMAN can be configured to support a wide range of DNN topologies-consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy versus power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results, and compare the same with the state of the art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.
引用
收藏
页码:24831 / 24845
页数:15
相关论文
共 50 条
  • [21] Accelerating TinyML Inference on Microcontrollers through Approximate Kernels
    Armeniakos, Giorgos
    Mentzos, Georgios
    Soudris, Dimitrios
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 177 - 177
  • [22] Object Detection at Edge Using TinyML Models
    Dharani A.
    Kumar S.A.
    Patil P.N.
    SN Computer Science, 5 (1)
  • [23] A Dynamically Reconfigurable Accelerator Design Using a Sparse-Winograd Decomposition Algorithm for CNNs
    Zhao, Yunping
    Lu, Jianzhuang
    Chen, Xiaowen
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 517 - 535
  • [24] CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks
    Saunak Saha
    Henry Duwe
    Joseph Zambreno
    Journal of Signal Processing Systems, 2020, 92 : 907 - 929
  • [25] Intelligence at the Extreme Edge: A Survey on Reformable TinyML
    Rajapakse, Visal
    Karunanayake, Ishan
    Ahmed, Nadeem
    ACM COMPUTING SURVEYS, 2023, 55 (13S)
  • [26] A Fully-Parallel Reconfigurable Spiking Neural Network Accelerator with Structured Sparse Connections
    Li, Mingyang
    Kan, Yirong
    Zhang, Renyuan
    Nakashima, Yasuhiko
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [27] CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks
    Saha, Saunak
    Duwe, Henry
    Zambreno, Joseph
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2020, 92 (09): : 907 - 929
  • [28] Fused DSConv: Optimizing Sparse CNN Inference for Execution on Edge Devices
    Guo, Jia
    Teodorescu, Radu
    Agrawal, Gagan
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 545 - 554
  • [29] Accurate Estimation of the CNN Inference Cost for TinyML Devices
    Garbay, Thomas
    Hachicha, Khalil
    Dobias, Petr
    Dron, Wilfried
    Lusich, Pedro
    Khalis, Imane
    Pinna, Andrea
    Granado, Bertrand
    2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 148 - 153
  • [30] TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications
    Alajlan, Norah N.
    Ibrahim, Dina M.
    MICROMACHINES, 2022, 13 (06)