Multi-Node Acceleration for Large-Scale GCNs

被引:4
|
作者
Sun, Gongjian [1 ,2 ]
Yan, Mingyu [1 ,2 ]
Wang, Duo [1 ,2 ]
Li, Han [1 ,2 ]
Li, Wenming [1 ,2 ]
Ye, Xiaochun [1 ,2 ]
Fan, Dongrui [1 ,2 ]
Xie, Yuan [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100045, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
[3] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
基金
中国国家自然科学基金;
关键词
Deep learning; graph neural network; hardware accelerator; multi-node system; communication optimization; NETWORK;
D O I
10.1109/TC.2022.3207127
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Limited by the memory capacity and computation power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like tensor processing unit (TPU) Pod for large-scale neural network. In this work, we aim to scale up single-node GCN accelerator to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) irregular coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) the acceleration of GCNs in MultiAccSys is mainly bounded by network bandwidth but tolerates network latency. Guided by the above observations, we then propose MultiGCN, an efficient MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4 & SIM; 12x speedup using only 28%$\sim$& SIM;68% energy, while reducing 32% transmissions and 73% off-chip memory accesses on average. Besides, MultiGCN not only achieves 2.5 & SIM; 8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graph as opposed to single-node GCN accelerators.
引用
收藏
页码:3140 / 3152
页数:13
相关论文
共 50 条
  • [1] Multi-Node Multi-GPU Diffeomorphic Image Registration for Large-Scale Imaging Problems
    Brunn, Malte
    Himthani, Naveen
    Biros, George
    Mehl, Miriam
    Mang, Andreas
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [2] A Layered Energy-Efficient Multi-Node Scheduling Mechanism for Large-Scale WSN
    Zhao, Xue
    Tao, Shaojun
    Tang, Hongying
    Wang, Jiang
    Li, Baoqing
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (01): : 1335 - 1351
  • [3] A Multi-node Renewable Algorithm Based on Charging Range in Large-scale Wireless Sensor Network
    Wu, Guowei
    Lin, Chi
    Li, Ying
    Yao, Lin
    Chen, Ailun
    2015 9TH INTERNATIONAL CONFERENCE ON INNOVATIVE MOBILE AND INTERNET SERVICES IN UBIQUITOUS COMPUTING IMIS 2015, 2015, : 94 - 100
  • [4] Anomaly Detection Based on GCNs and DBSCAN in a Large-Scale Graph
    Emane, Christopher Retiti Diop
    Song, Sangho
    Lee, Hyeonbyeong
    Choi, Dojin
    Lim, Jongtae
    Bok, Kyoungsoo
    Yoo, Jaesoo
    ELECTRONICS, 2024, 13 (13)
  • [5] A multi-node attack scheme based on community partitioning in large scale infrastructure networks
    Li, Beibei
    Hu, Wei
    COMPUTER NETWORKS, 2024, 245
  • [6] Optimized design of converter valve control period for large-scale multi-node MMC-based HVDC-flexible
    Han, Kun
    Wu, Jinlong
    Liu, Xinhe
    Zhang, Jian
    Yao, Weizheng
    Dianli Zidonghua Shebei/Electric Power Automation Equipment, 2015, 35 (01): : 36 - 43
  • [7] Robot Self-Localization in Ultra-Wideband Large Scale Multi-Node Setups
    Zandian, Reza
    Witkowski, Ulf
    2017 14TH WORKSHOP ON POSITIONING, NAVIGATION AND COMMUNICATIONS (WPNC), 2017,
  • [8] Multi-node knowledge graph assisted distributed fault detection for large-scale industrial processes based on graph attention network and bidirectional LSTMs
    Li, Qing
    Wang, Yangfan
    Dong, Jie
    Zhang, Chi
    Peng, Kaixiang
    NEURAL NETWORKS, 2024, 173
  • [9] Acceleration of large-scale CGH generation using multi-GPU cluster
    Watanabe, Shinpei
    Jackin, Boaz Jessie
    Ohkawa, Takeshi
    Ootsu, Kanemitsu
    Yokota, Takashi
    Hayasaki, Yoshio
    Yatagai, Toyohiko
    Baba, Takanobu
    2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2017, : 589 - 593
  • [10] A novel algorithm for multi-node bridge analysis of large VLSI circuits
    Zachariah, ST
    Chakravarty, S
    VLSI DESIGN 2001: FOURTEENTH INTERNATIONAL CONFERENCE ON VLSI DESIGN, 2001, : 333 - 338