RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on Edge

被引：2

作者：

Krishna, Adithya ^{[1
,2
]}

Rohit Nudurupati, Srikanth ^{[3
]}

Chandana, D. G. ^{[3
]}

Dwivedi, Pritesh ^{[3
]}

van Schaik, Andre ^{[2
]}

Mehendale, Mahesh ^{[3
]}

Thakur, Chetan Singh ^{[3
]}

机构：

[1] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India

[2] Western Sydney Univ, MARCS Inst, Int Ctr Neuromorph Syst, Penrith, NSW 2751, Australia

[3] Indian Inst Sci, Dept Elect Syst Engn, Bengaluru 560012, India

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 14期

关键词：

Standards; Field programmable gate arrays; Random access memory; Hardware; Costs; Convolution; System-on-chip; Convolutional neural networks (CNNs); deep learning; hardware acceleration; sparse processing; DEEP NEURAL-NETWORKS; FLEXIBLE ACCELERATOR; EFFICIENT; PROCESSOR; CNN;

D O I：

10.1109/JIOT.2024.3386832

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep neural network (DNN)-based inference at the edge is challenging as these compute, and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this article, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power, as well as latency. RAMAN can be configured to support a wide range of DNN topologies-consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy versus power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results, and compare the same with the state of the art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.

引用

页码：24831 / 24845

页数：15

共 50 条

[1] FPNA: A Reconfigurable Accelerator for AI Inference at the Edge
Gadfort, Peter
Ayorinde, Oluseyi A.
34TH IEEE INTERNATIONAL SYSTEM ON CHIP CONFERENCE (SOCC), 2021, : 242 - 247
[2] Live Demonstration: Real-time audio and visual inference on the RAMAN TinyML accelerator
Krishna, Adithya
Rajesh, Ashwin
Oleti, Hitesh Pavan
Chauhan, Anand
Shankaranarayanan, H.
van Schaik, Andre
Mehendale, Mahesh
Thakur, Chetan Singh
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[3] Energy-Efficient Inference on the Edge Exploiting TinyML Capabilities for UAVs
Raza, Wamiq
Osman, Anas
Ferrini, Francesco
Natale, Francesco De
DRONES, 2021, 5 (04)
[4] ALRESCHA: A Lightweight Reconfigurable Sparse-Computation Accelerator
Asgari, Bahar
Hadidi, Ramyad
Krishna, Tushar
Kim, Hyesoon
Yalamanchili, Sudhakar
2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 249 - 260
[5] Reconfigurable Intelligent Surface for Green Edge Inference
Hua, Sheng
Zhou, Yong
Yang, Kai
Shi, Yuanming
Wang, Kunlun
IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2021, 5 (02): : 964 - 979
[6] SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator
Pal, Subhankar
Amarnath, Aporva
Feng, Siying
O'Boyle, Michael
Dreslinski, Ronald
Dubach, Christophe
PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 1005 - 1021
[7] Sparse optimization for green edge ai inference
Yang, Xiang Yu
Hua, Sheng
Shi, Yuan Ming
Wang, Hao
Zhang, Jun
Letaief, Khaled B.
Journal of Communications and Information Networks, 2020, 5 (01) : 1 - 15
[8] EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference
Gao, Chang
Rios-Navarro, Antonio
Chen, Xi
Liu, Shih-Chii
Delbruck, Tobi
IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2020, 10 (04) : 419 - 432
[9] ViTA: A Vision Transformer Inference Accelerator for Edge Applications
Nag, Shashank
Datta, Gourav
Kundu, Souvik
Chandrachoodan, Nitin
Beerel, Peter A.
2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
[10] Hardware Accelerator Design for Sparse DNN Inference and Training: A Tutorial
Mao, Wendong
Wang, Meiqi
Xie, Xiaoru
Wu, Xiao
Wang, Zhongfeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (03) : 1708 - 1714

← 1 2 3 4 5 →