Distributed large-scale graph processing on FPGAs

被引:0
|
作者
Amin Sahebi
Marco Barbone
Marco Procaccini
Wayne Luk
Georgi Gaydadjiev
Roberto Giorgi
机构
[1] University of Siena,Department of Information Engineering and Mathematics
[2] University of Florence,Department of Information Engineering
[3] Imperial College London,Department of Computing
[4] Delft University of Technology,Department of Quantum and Computer Engineering
[5] Consorzio Interuniversitario Nazionale per l’Informatica,undefined
来源
关键词
Graph processing; Distributed computing; Grid partitioning; FPGA; Accelerators;
D O I
暂无
中图分类号
学科分类号
摘要
Processing large-scale graphs is challenging due to the nature of the computation that causes irregular memory access patterns. Managing such irregular accesses may cause significant performance degradation on both CPUs and GPUs. Thus, recent research trends propose graph processing acceleration with Field-Programmable Gate Arrays (FPGA). FPGAs are programmable hardware devices that can be fully customised to perform specific tasks in a highly parallel and efficient manner. However, FPGAs have a limited amount of on-chip memory that cannot fit the entire graph. Due to the limited device memory size, data needs to be repeatedly transferred to and from the FPGA on-chip memory, which makes data transfer time dominate over the computation time. A possible way to overcome the FPGA accelerators’ resource limitation is to engage a multi-FPGA distributed architecture and use an efficient partitioning scheme. Such a scheme aims to increase data locality and minimise communication between different partitions. This work proposes an FPGA processing engine that overlaps, hides and customises all data transfers so that the FPGA accelerator is fully utilised. This engine is integrated into a framework for using FPGA clusters and is able to use an offline partitioning method to facilitate the distribution of large-scale graphs. The proposed framework uses Hadoop at a higher level to map a graph to the underlying hardware platform. The higher layer of computation is responsible for gathering the blocks of data that have been pre-processed and stored on the host’s file system and distribute to a lower layer of computation made of FPGAs. We show how graph partitioning combined with an FPGA architecture will lead to high performance, even when the graph has Millions of vertices and Billions of edges. In the case of the PageRank algorithm, widely used for ranking the importance of nodes in a graph, compared to state-of-the-art CPU and GPU solutions, our implementation is the fastest, achieving a speedup of 13 compared to 8 and 3 respectively. Moreover, in the case of the large-scale graphs, the GPU solution fails due to memory limitations while the CPU solution achieves a speedup of 12 compared to the 26x achieved by our FPGA solution. Other state-of-the-art FPGA solutions are 28 times slower than our proposed solution. When the size of a graph limits the performance of a single FPGA device, our performance model shows that using multi-FPGAs in a distributed system can further improve the performance by about 12x. This highlights our implementation efficiency for large datasets not fitting in the on-chip memory of a hardware device.
引用
收藏
相关论文
共 50 条
  • [1] Distributed large-scale graph processing on FPGAs
    Sahebi, Amin
    Barbone, Marco
    Procaccini, Marco
    Luk, Wayne
    Gaydadjiev, Georgi
    Giorgi, Roberto
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [2] Large-Scale Graph Processing on FPGAs with Caches for Thousands of Simultaneous Misses
    Asiatici, Mikhail
    Ienne, Paolo
    [J]. 2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 609 - 622
  • [3] NewGraph: Balanced Large-scale Graph Processing on FPGAs with Low Preprocessing Overheads
    Dai, Guohao
    Huang, Tianhao
    Wang, Yu
    Yang, Huazhong
    Wawrzynek, John
    [J]. PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, : 208 - 208
  • [4] An Analysis of Distributed Programming Models and Frameworks for Large-scale Graph Processing
    Corbellini, Alejandro
    Godoy, Daniela
    Mateos, Cristian
    Schiaffino, Silvia
    Zunino, Alejandro
    [J]. IETE JOURNAL OF RESEARCH, 2022, 68 (04) : 3065 - 3073
  • [5] Performance and Monetary Cost of Large-scale Distributed Graph Processing on Amazon Cloud
    Li, Zengxiang
    Thai Nguyen Hung
    Lu, Sifei
    Goh, Rick Siow Mong
    [J]. 2016 INTERNATIONAL CONFERENCE ON CLOUD COMPUTING RESEARCH AND INNOVATION - ICCCRI 2016, 2016, : 9 - 16
  • [6] Large Scale Graph Processing in a Distributed Environment
    Upadhyay, Nitesh
    Patel, Parita
    Cheramangalath, Unnikrishnan
    Srikant, Y. N.
    [J]. EURO-PAR 2017: PARALLEL PROCESSING WORKSHOPS, 2018, 10659 : 465 - 477
  • [7] On the Distributed Complexity of Large-Scale Graph Computations
    Pandurangan, Gopal
    Robinson, Peter
    Scquizzato, Michele
    [J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2021, 8 (02)
  • [8] On the Distributed Complexity of Large-Scale Graph Computations
    Pandurangan, Gopal
    Robinson, Peter
    Scquizzato, Michele
    [J]. SPAA'18: PROCEEDINGS OF THE 30TH ACM SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2018, : 405 - 414
  • [9] A Distributed Algorithm for Large-Scale Graph Partitioning
    Rahimian, Fatemeh
    Payberah, Amir H.
    Girdzijauskas, Sarunas
    Jelasity, Mark
    Haridi, Seif
    [J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 10 (02)
  • [10] Large-scale graph processing systems: a survey
    Liu, Ning
    Li, Dong-sheng
    Zhang, Yi-ming
    Li, Xiong-lve
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (03) : 384 - 404