Distributed large-scale graph processing on FPGAs

被引:1
|
作者
Sahebi, Amin [1 ,2 ]
Barbone, Marco [3 ]
Procaccini, Marco [1 ,5 ]
Luk, Wayne [3 ]
Gaydadjiev, Georgi [3 ,4 ]
Giorgi, Roberto [1 ,5 ]
机构
[1] Univ Siena, Dept Informat Engn & Math, Siena, Italy
[2] Univ Florence, Dept Informat Engn, Florence, Italy
[3] Imperial Coll London, Dept Comp, London, England
[4] Delft Univ Technol, Dept Quantum & Comp Engn, Delft, Netherlands
[5] Consorzio Interuniv Nazl Informat, Rome, Italy
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
Graph processing; Distributed computing; Grid partitioning; FPGA; Accelerators; MODEL;
D O I
10.1186/s40537-023-00756-x
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited device memory size, data needs to be repeatedly transferred to and from the FPGA on-chip memory, which makes data transfer time dominate over the computation time. A possible way to overcome the FPGA accelerators' resource limitation is to engage a multi-FPGA distributed architecture and use an efficient partitioning scheme. Such a scheme aims to increase data locality and minimise communication between different partitions. This work proposes an FPGA processing engine that overlaps, hides and customises all data transfers so that the FPGA accelerator is fully utilised. This engine is integrated into a framework for using FPGA clusters and is able to use an offline partitioning method to facilitate the distribution of large-scale graphs. The proposed framework uses Hadoop at a higher level to map a graph to the underlying hardware platform. The higher layer of computation is responsible for gathering the blocks of data that have been pre-processed and stored on the host's file system and distribute to a lower layer of computation made of FPGAs. We show how graph partitioning combined with an FPGA architecture will lead to high performance, even when the graph has Millions of vertices and Billions of edges. In the case of the PageRank algorithm, widely used for ranking the importance of nodes in a graph, compared to state-of-the-art CPU and GPU solutions, our implementation is the fastest, achieving a speedup of 13 compared to 8 and 3 respectively. Moreover, in the case of the large-scale graphs, the GPU solution fails due to memory limitations while the CPU solution achieves a speedup of 12 compared to the 26x achieved by our FPGA solution. Other state-of-the-art FPGA solutions are 28 times slower than our proposed solution. When the size of a graph limits the performance of a single FPGA device, our performance model shows that using multi-FPGAs in a distributed system can further improve the performance by about 12x. This highlights our implementation efficiency for large datasets not fitting in the on-chip memory of a hardware device.
引用
下载
收藏
页数:28
相关论文
共 50 条
  • [21] DPM: A novel distributed large-scale social graph processing framework for link prediction algorithms
    Corbellini, Alejandro
    Godoy, Daniela
    Mateos, Cristian
    Schiaffino, Silvia
    Zunino, Alejandro
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 78 : 474 - 480
  • [22] DynamoGraph: A Distributed System for Large-scale, Temporal Graph Processing, its Implementation and First Observations
    Steinbauer, Matthias
    Anderst-Kotsis, Gabriele
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 861 - 866
  • [23] Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network
    Huang, Linyong
    Zhang, Zhe
    Li, Shuangchen
    Niu, Dimin
    Guan, Yijin
    Zheng, Hongzhong
    Xie, Yuan
    IEEE ACCESS, 2022, 10 : 46796 - 46807
  • [24] GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing
    Dai, Guohao
    Huang, Tianhao
    Chi, Yuze
    Zhao, Jishen
    Sun, Guangyu
    Liu, Yongpan
    Wang, Yu
    Xie, Yuan
    Yang, Huazhong
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2019, 38 (04) : 640 - 653
  • [25] Large-scale Cellular Automata on FPGAs
    Kyparissas N.
    Dollas A.
    ACM Transactions on Reconfigurable Technology and Systems, 2020, 14 (01):
  • [26] Complex query processing in large-scale distributed system
    Zhou, Ao-Ying
    Zhou, Min-Qi
    Qian, Wei-Ning
    Zhang, Rong
    Jisuanji Xuebao/Chinese Journal of Computers, 2008, 31 (09): : 1563 - 1572
  • [27] Distributed Data Processing for Large-Scale Simulations on Cloud
    Lu, Tianjian
    Hoyer, Stephan
    Wang, Qing
    Hu, Lily
    Chen, Yi-Fan
    2021 JOINT IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY, SIGNAL & POWER INTEGRITY, AND EMC EUROPE (EMC+SIPI AND EMC EUROPE), 2021, : 53 - 58
  • [28] IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph Databases
    Dai, Dong
    Zhang, Wei
    Chen, Yong
    ACM SIGPLAN NOTICES, 2017, 52 (08) : 439 - 440
  • [29] Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation
    Lu, Yi
    Cheng, James
    Yan, Da
    Wu, Huanhuan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 8 (03): : 281 - 292
  • [30] Mycelium: Large-Scale Distributed Graph Queries with Differential Privacy
    Roth, Edo
    Newatia, Karan
    Ma, Yiping
    Zhong, Ke
    Angel, Sebastian
    Haeberlen, Andreas
    PROCEEDINGS OF THE 28TH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, SOSP 2021, 2021, : 327 - 343