HeNCoG: A Heterogeneous Near-memory Computing Architecture for Energy Efficient GCN Acceleration

被引：0

作者：

Hwang, Seung-Eon ^{[1
]}

Song, Duyeong ^{[1
]}

Park, Jongsun ^{[1
]}

机构：

[1] Korea Univ, Sch Elect Engn, Seoul, South Korea

来源：

2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024 | 2024年

基金：

新加坡国家研究基金会;

关键词：

Graph Convolutional Network; Sparse Matrix Multiplication; Near-memory Computing; Domain Specific Accelerator; PERFORMANCE;

D O I：

10.1109/ISCAS58744.2024.10558133

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Graph convolutional network (GCN), which first applies convolutional operations to process graph data, has gained attention in various tasks involving relational data. Previous GCN accelerators have been designed with heterogeneous cores, considering two stages of inference (aggregation and combination), or with a unified core based on the inference of multi layer as an iterative sparse-dense matrix multiplication. However, those prior works have suffered from an unnecessary large number of multiply-accumulate (MAC) operations and/or main memory accesses. In this paper, we propose HeNCoG, a GCN accelerator that utilizes a heterogeneous MAC array core for the combination stage and a near-memory computing core for the aggregation stage. In HeNCoG, considering that the number of MAC operations is significantly reduced when changing the stage execution order, the combination stage is executed first with a row-stationary dataflow. In the aggregation stage, magneto-resistive random-access memory (MRAM)-based near-memory computing is employed to reduce the number of main memory accesses needed to access the adjacency matrix in the graph dataset. Graph partitioning and double buffering techniques are also applied to further improve hardware efficiencies. Simulation results show that the HeNCoG architecture reduces execution cycles by 97% and memory accesses by 42% compared to previous works.

引用

页数：5

共 50 条

[31] Coherently Attached Programmable Near-Memory Acceleration Platform and its application to Stencil Processing
van Lunteren, Jan
Luijten, Ronald
Diamantopoulos, Dionysios
Auernhammer, Florian
Hagleitner, Christoph
Chelini, Lorenzo
Corda, Stefano
Singh, Gagandeep
2019 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2019, : 668 - 673
[32] Toward Energy-efficient STT-MRAM-based Near Memory Computing Architecture for Embedded Systems
Li, Yueting
Wang, Xueyan
Zhang, He
Pan, Biao
Qiu, Keni
Kang, Wang
Wang, Jun
Zhao, Weisheng
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2024, 23 (03)
[33] NAPEL: Near-Memory Computing Application Performance Prediction via Ensemble Learning
Singh, Gagandeep
Gomez-Luna, Juan
Mariani, Giovanni
Oliveira, Gerald F.
Corda, Stefano
Stuijk, Sander
Mutlu, Onur
Corporaal, Henk
PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[34] A Precision -Optimized Fixed -Point Near-Memory Digital Processing Unit for Analog In -Memory Computing
Ferro, Elena
Vasilopoulos, Athanasios
Lammie, Corey
Le Gallo, Manuel
Benini, Luca
Boybat, Irem
Sebastian, Abu
2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
[35] FPGA-Based Near-Memory Acceleration of Modern Data-Intensive Applications
Singh, Gagandeep
Alser, Mohammed
Cali, Damla Senol
Diamantopoulos, Dionysios
Gomez-Luna, Juan
Corporaal, Henk
Mutlu, Onur
IEEE MICRO, 2021, 41 (04) : 39 - 48
[36] PageForge: A Near-Memory Content-Aware Page-Merging Architecture
Skarlatos, Dimitrios
Kim, Nam Sung
Torrellas, Josep
50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 302 - 314
[37] A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets
Schuiki, Fabian
Schaffner, Michael
Gurkaynak, Frank K.
Benini, Luca
IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (04) : 484 - 497
[38] STT-MRAM-based Near-Memory Computing Architecture with Read Scheme and Dataflow Co-Design for High-Throughput and Energy-Efficiency
Jang, Yunho
Kim, Yeseul
Park, Jongsun
PROCEEDINGS OF THE 29TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED 2024, 2024,
[39] Acceleration of Bulk Memory Operations in a Heterogeneous Multicore Architecture
Lee, JongHyuk
Liu, Ziyi
Tian, Xiaonan
Woo, Dong Hyuk
Shi, Weidong
Boumber, Dainis
PROCEEDINGS OF THE 21ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT'12), 2012, : 423 - 424
[40] HeTraX: Energy Efficient 3D Heterogeneous Manycore Architecture for Transformer Acceleration
Dhingra, Pratyush
Doppa, Janardhan Rao
Pande, Partha Pratim
PROCEEDINGS OF THE 29TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, ISLPED 2024, 2024,

← 1 2 3 4 5 →