Multi-Node Acceleration for Large-Scale GCNs

被引:4
|
作者
Sun, Gongjian [1 ,2 ]
Yan, Mingyu [1 ,2 ]
Wang, Duo [1 ,2 ]
Li, Han [1 ,2 ]
Li, Wenming [1 ,2 ]
Ye, Xiaochun [1 ,2 ]
Fan, Dongrui [1 ,2 ]
Xie, Yuan [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, State Key Lab Processors, Beijing 100045, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 101408, Peoples R China
[3] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
基金
中国国家自然科学基金;
关键词
Deep learning; graph neural network; hardware accelerator; multi-node system; communication optimization; NETWORK;
D O I
10.1109/TC.2022.3207127
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Limited by the memory capacity and computation power, singe-node graph convolutional neural network (GCN) accelerators cannot complete the execution of GCNs within a reasonable amount of time, due to the explosive size of graphs nowadays. Thus, large-scale GCNs call for a multi-node acceleration system (MultiAccSys) like tensor processing unit (TPU) Pod for large-scale neural network. In this work, we aim to scale up single-node GCN accelerator to accelerate GCNs on large-scale graphs. We first identify the communication pattern and challenges of multi-node acceleration for GCNs on large-scale graphs. We observe that (1) irregular coarse-grained communication patterns exist in the execution of GCNs in MultiAccSys, which introduces massive amount of redundant network transmissions and off-chip memory accesses; (2) the acceleration of GCNs in MultiAccSys is mainly bounded by network bandwidth but tolerates network latency. Guided by the above observations, we then propose MultiGCN, an efficient MultiAccSys for large-scale GCNs that trades network latency for network bandwidth. Specifically, by leveraging the network latency tolerance, we first propose a topology-aware multicast mechanism with a one put per multicast message-passing model to reduce transmissions and alleviate network bandwidth requirements. Second, we introduce a scatter-based round execution mechanism which cooperates with the multicast mechanism and reduces redundant off-chip memory accesses. Compared to the baseline MultiAccSys, MultiGCN achieves 4 & SIM; 12x speedup using only 28%$\sim$& SIM;68% energy, while reducing 32% transmissions and 73% off-chip memory accesses on average. Besides, MultiGCN not only achieves 2.5 & SIM; 8x speedup over the state-of-the-art multi-GPU solution, but also scales to large-scale graph as opposed to single-node GCN accelerators.
引用
收藏
页码:3140 / 3152
页数:13
相关论文
共 50 条
  • [21] FAWS: FPGA Acceleration of Large-Scale Wave Simulations
    Gourounas, Dimitrios
    Hanindhito, Bagus
    Fathi, Arash
    Trenev, Dimitar
    John, Lizy K.
    Gerstlauer, Andreas
    2023 IEEE 34TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, ASAP, 2023, : 76 - 84
  • [22] GPU Acceleration of Zernike Moments for Large-scale Images
    Ujaldon, Manuel
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 2033 - 2040
  • [23] Particle acceleration in large-scale DC electric fields
    Holman, GD
    HIGH ENERGY SOLAR PHYSICS - ANTICIPATING HESSI, 2000, 206 : 135 - 144
  • [24] On the Acceleration of the Vector Fitting for Multiport Large-Scale Macromodeling
    Chou, Chiu-Chih
    Schutt-Aine, Jose E.
    IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS, 2021, 31 (01) : 1 - 4
  • [25] GPU acceleration of ADMM for large-scale quadratic programming
    Schubiger, Michel
    Banjac, Goran
    Lygeros, John
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2020, 144 : 55 - 67
  • [26] Electron acceleration sites in a large-scale coronal structure
    Klein, KL
    Aurass, H
    SoruEscaut, I
    Kalman, B
    ASTRONOMY & ASTROPHYSICS, 1997, 320 (02) : 612 - 619
  • [27] Large scale multi-node simulations of Z2 gauge theory quantum circuits using Google Cloud Platform
    Gustafson, Erik
    Holzman, Burt
    Kowalkowski, James
    Lamm, Henry
    Li, Andy C. Y.
    Perdue, Gabriel
    Isakov, Sergei, V
    Martin, Orion
    Thomson, Ross
    Beall, Jackson
    Ganahl, Martin
    Vidal, Guifre
    Peters, Evan
    Boixo, Sergio
    PROCEEDINGS OF SECOND INTERNATIONAL WORKSHOP ON QUANTUM COMPUTING SOFTWARE (QCS 2021), 2021, : 72 - 79
  • [28] Acceleration of 3D ECT image reconstruction in heterogeneous, multi-GPU, multi-node distributed system
    Majchrowicz, Michal
    Kapusta, Pawel
    Jackowska-Strumillo, Lidia
    Sankowski, Dominik
    PROCEEDINGS OF THE 2018 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2018, : 347 - 350
  • [29] Robustness of scale-free networks with dynamical behavior against multi-node perturbation
    Lv, Changchun
    Yuan, Ziwei
    Si, Shubin
    Duan, Dongli
    CHAOS SOLITONS & FRACTALS, 2021, 152
  • [30] Optimization of a Multi-node System Reliability Model
    魏展明
    周凡
    陈耀武
    Journal of Donghua University(English Edition), 2011, 28 (05) : 451 - 455