Multi-Node Acceleration for Large-Scale GCNs

被引：4

作者：

Sun, Gongjian ^{[1
,2
]}

Yan, Mingyu ^{[1
,2
]}

Wang, Duo ^{[1
,2
]}

Li, Han ^{[1
,2
]}

Li, Wenming ^{[1
,2
]}

Ye, Xiaochun ^{[1
,2
]}

Fan, Dongrui ^{[1
,2
]}

Xie, Yuan ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100045, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China

[3] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA

来源：

IEEE TRANSACTIONS ON COMPUTERS | 2022年 / 71卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Deep learning; graph neural network; hardware accelerator; multi-node system; communication optimization; NETWORK;

D O I：

10.1109/TC.2022.3207127

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Limited by the memory capacity and computation power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like tensor processing unit (TPU) Pod for large-scale neural network. In this work, we aim to scale up single-node GCN accelerator to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) irregular coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) the acceleration of GCNs in MultiAccSys is mainly bounded by network bandwidth but tolerates network latency. Guided by the above observations, we then propose MultiGCN, an efficient MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4 & SIM; 12x speedup using only 28%$\sim$& SIM;68% energy, while reducing 32% transmissions and 73% off-chip memory accesses on average. Besides, MultiGCN not only achieves 2.5 & SIM; 8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graph as opposed to single-node GCN accelerators.

引用

页码：3140 / 3152

页数：13

共 50 条

[1] Multi-Node Multi-GPU Diffeomorphic Image Registration for Large-Scale Imaging Problems
Brunn, Malte
Himthani, Naveen
Biros, George
Mehl, Miriam
Mang, Andreas
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[2] A Layered Energy-Efficient Multi-Node Scheduling Mechanism for Large-Scale WSN
Zhao, Xue
Tao, Shaojun
Tang, Hongying
Wang, Jiang
Li, Baoqing
CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 1335 - 1351
[3] A Multi-node Renewable Algorithm Based on Charging Range in Large-scale Wireless Sensor Network
Wu, Guowei
Lin, Chi
Li, Ying
Yao, Lin
Chen, Ailun
2015 9TH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING IMIS 2015, 2015, : 94 - 100
[4] Anomaly Detection Based on GCNs and DBSCAN in a Large-Scale Graph
Emane, Christopher Retiti Diop
Song, Sangho
Lee, Hyeonbyeong
Choi, Dojin
Lim, Jongtae
Bok, Kyoungsoo
Yoo, Jaesoo
ELECTRONICS, 2024, 13 (13)
[5] A multi-node attack scheme based on community partitioning in large scale infrastructure networks
Li, Beibei
Hu, Wei
COMPUTER NETWORKS, 2024, 245
[6] Optimized design of converter valve control period for large-scale multi-node MMC-based HVDC-flexible
Han, Kun
Wu, Jinlong
Liu, Xinhe
Zhang, Jian
Yao, Weizheng
Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2015, 35 (01): : 36 - 43
[7] Robot Self-Localization in Ultra-Wideband Large Scale Multi-Node Setups
Zandian, Reza
Witkowski, Ulf
2017 14TH WORKSHOP ON POSITIONING, NAVIGATION AND COMMUNICATIONS (WPNC), 2017,
[8] Multi-node knowledge graph assisted distributed fault detection for large-scale industrial processes based on graph attention network and bidirectional LSTMs
Li, Qing
Wang, Yangfan
Dong, Jie
Zhang, Chi
Peng, Kaixiang
NEURAL NETWORKS, 2024, 173
[9] Acceleration of large-scale CGH generation using multi-GPU cluster
Watanabe, Shinpei
Jackin, Boaz Jessie
Ohkawa, Takeshi
Ootsu, Kanemitsu
Yokota, Takashi
Hayasaki, Yoshio
Yatagai, Toyohiko
Baba, Takanobu
2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 589 - 593
[10] A novel algorithm for multi-node bridge analysis of large VLSI circuits
Zachariah, ST
Chakravarty, S
VLSI DESIGN 2001: FOURTEENTH INTERNATIONAL CONFERENCE ON VLSI DESIGN, 2001, : 333 - 338

← 1 2 3 4 5 →