Multi-Node Acceleration for Large-Scale GCNs

被引：4

作者：

Sun, Gongjian ^{[1
,2
]}

Yan, Mingyu ^{[1
,2
]}

Wang, Duo ^{[1
,2
]}

Li, Han ^{[1
,2
]}

Li, Wenming ^{[1
,2
]}

Ye, Xiaochun ^{[1
,2
]}

Fan, Dongrui ^{[1
,2
]}

Xie, Yuan ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100045, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China

[3] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2022年 / 71卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Deep learning; graph neural network; hardware accelerator; multi-node system; communication optimization; NETWORK;

D O I：

10.1109/TC.2022.3207127

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Limited by the memory capacity and computation power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like tensor processing unit (TPU) Pod for large-scale neural network. In this work, we aim to scale up single-node GCN accelerator to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) irregular coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) the acceleration of GCNs in MultiAccSys is mainly bounded by network bandwidth but tolerates network latency. Guided by the above observations, we then propose MultiGCN, an efficient MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4 & SIM; 12x speedup using only 28%$\sim$& SIM;68% energy, while reducing 32% transmissions and 73% off-chip memory accesses on average. Besides, MultiGCN not only achieves 2.5 & SIM; 8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graph as opposed to single-node GCN accelerators.

引用

页码：3140 / 3152

页数：13

共 50 条

[11] Large-scale peculiar motions and cosmic acceleration
Tsagas, Christos G.
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2010, 405 (01) : 503 - 508
[12] Set Reconciliation in Multi-Node Environment
Selvan, Aravind
2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
[13] Multi-GPU acceleration of large-scale density-based topology optimization
Herrero-Perez, David
Martinez Castejon, Pedro J.
ADVANCES IN ENGINEERING SOFTWARE, 2021, 157
[14] Parameterized complexity of multi-node hubs
Saurabh, Saket
Zehavi, Meirav
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2023, 131 : 64 - 85
[15] ZodiacMSM: A Heterogeneous, Multi-node and Scalable Multi-Scalar Multiplication System for Zero Knowledge Proof Acceleration
Xu, Yiyang
Qian, Dahong
2023 IEEE 36TH INTERNATIONAL SYSTEM-ON-CHIP CONFERENCE, SOCC, 2023, : 330 - 335
[16] The Application of Multi-Node Cable Element
Sun Xiao-yu
Wang Zhen-qing
FRONTIERS OF ADVANCED MATERIALS AND ENGINEERING TECHNOLOGY, PTS 1-3, 2012, 430-432 : 1498 - 1501
[17] Large-scale VTOA switching node architecture
Sunaga, H
Okutani, T
Miyake, K
IEICE TRANSACTIONS ON COMMUNICATIONS, 1999, E82B (01): : 70 - 80
[18] Large-scale Sparse Structural Node Representation
Serra, Edoardo
Joaristi, Mikel
Cuzzocrea, Alfredo
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5247 - 5253
[19] Simulating Large-Scale Structure for Models of Cosmic Acceleration
Lombriser, Lucas
OBSERVATORY, 2020, 140 (1275): : 65 - 66
[20] TCHEBYCHEV ACCELERATION TECHNIQUE FOR LARGE-SCALE NONSYMMETRIC MATRICES
HO, D
NUMERISCHE MATHEMATIK, 1990, 56 (07) : 721 - 734

← 1 2 3 4 5 →