RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge

被引：2

作者：

Krishna, Adithya ^{[1
,2
]}

Rohit Nudurupati, Srikanth ^{[3
]}

Chandana, D. G. ^{[3
]}

Dwivedi, Pritesh ^{[3
]}

van Schaik, Andre ^{[2
]}

Mehendale, Mahesh ^{[3
]}

Thakur, Chetan Singh ^{[3
]}

机构：

[1] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India

[2] Western Sydney Univ, MARCS Inst, Int Ctr Neuromorph Syst, Penrith, NSW 2751, Australia

[3] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 14期

关键词：

Standards; Field programmable gate arrays; Random access memory; Hardware; Costs; Convolution; System-on-chip; Convolutional neural networks (CNNs); deep learning; hardware acceleration; sparse processing; DEEP NEURAL-NETWORKS; FLEXIBLE ACCELERATOR; EFFICIENT; PROCESSOR; CNN;

D O I：

10.1109/JIOT.2024.3386832

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural network (DNN)-based inference at the edge is challenging as these compute, and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this article, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power, as well as latency. RAMAN can be configured to support a wide range of DNN topologies-consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy versus power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results, and compare the same with the state of the art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.

引用

页码：24831 / 24845

页数：15

共 50 条

[21] Accelerating TinyML Inference on Microcontrollers through Approximate Kernels
Armeniakos, Giorgos
Mentzos, Georgios
Soudris, Dimitrios
2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 177 - 177
[22] Object Detection at Edge Using TinyML Models
Dharani A.
Kumar S.A.
Patil P.N.
SN Computer Science, 5 (1)
[23] A Dynamically Reconfigurable Accelerator Design Using a Sparse-Winograd Decomposition Algorithm for CNNs
Zhao, Yunping
Lu, Jianzhuang
Chen, Xiaowen
CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 66 (01): : 517 - 535
[24] CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks
Saunak Saha
Henry Duwe
Joseph Zambreno
Journal of Signal Processing Systems, 2020, 92 : 907 - 929
[25] Intelligence at the Extreme Edge: A Survey on Reformable TinyML
Rajapakse, Visal
Karunanayake, Ishan
Ahmed, Nadeem
ACM COMPUTING SURVEYS, 2023, 55 (13S)
[26] A Fully-Parallel Reconfigurable Spiking Neural Network Accelerator with Structured Sparse Connections
Li, Mingyang
Kan, Yirong
Zhang, Renyuan
Nakashima, Yasuhiko
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[27] CyNAPSE: A Low-power Reconfigurable Neural Inference Accelerator for Spiking Neural Networks
Saha, Saunak
Duwe, Henry
Zambreno, Joseph
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2020, 92 (09): : 907 - 929
[28] Fused DSConv: Optimizing Sparse CNN Inference for Execution on Edge Devices
Guo, Jia
Teodorescu, Radu
Agrawal, Gagan
21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 545 - 554
[29] Accurate Estimation of the CNN Inference Cost for TinyML Devices
Garbay, Thomas
Hachicha, Khalil
Dobias, Petr
Dron, Wilfried
Lusich, Pedro
Khalis, Imane
Pinna, Andrea
Granado, Bertrand
2022 IEEE 35TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE (IEEE SOCC 2022), 2022, : 148 - 153
[30] TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications
Alajlan, Norah N.
Ibrahim, Dina M.
MICROMACHINES, 2022, 13 (06)

← 1 2 3 4 5 →