Optimized Implementation of the HPCG Benchmark on Reconfigurable Hardware

被引:5
|
作者
Zeni, Alberto [1 ,2 ]
O'Brien, Kenneth [1 ]
Blott, Michaela [1 ]
Santambrogio, Marco D. [2 ]
机构
[1] Xilinx Inc, Res Labs, Dublin, Ireland
[2] Politecn Milan, Milan, Italy
来源
关键词
Reconfigurable architectures; High performance computing; Benchmark testing; HIGH-PERFORMANCE;
D O I
10.1007/978-3-030-85665-6_38
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The HPCG benchmark represents a modern complement to the HPL benchmark in the performance evaluation of HPC systems, as it has been recognized as a more representative benchmark to reflect real-world applications. While typical workloads become more and more challenging, the semiconductor industry is battling with performance scaling and power efficiency on next-generation technology nodes. As a result, the industry is turning towards more customized compute architectures to help meet the latest performance requirements. In this paper, we present the details of the first FPGA-based implementation of HPCG that takes advantage of such customized compute architectures. Our results show that our high-performance multi-FPGA implementation, using 1 and 4 Xilinx Alveo U280 achieves up to 108.3 GFlops and 346.5 GFlops respectively, representing speed-ups of 104.1x and 333.2x over software running on a server with an Intel Xeon processor with no loss of accuracy. We also demonstrate that the FPGA-based solution achieves comparable performance with respect to modern GPUs and an up to 2.7x improvement in terms of power efficiency compared to an NVIDIA Tesla V100. Finally, a theoretical evaluation, based on Berkeley's Roofline model demonstrates that our implementation is near optimally tuned on the Xilinx Alveo U280.
引用
收藏
页码:616 / 630
页数:15
相关论文
共 50 条
  • [31] Reconfigurable hardware implementation of host-based IDS
    Sato, T
    Fukase, M
    APCC 2003: 9TH ASIA-PACIFIC CONFERENCE ON COMMUNICATION, VOLS 1-3, PROCEEDINGS, 2003, : 849 - 853
  • [32] Hardware Implementation and Testing of Reconfigurable RTU for Wireless SCADA
    Muhammad Aamir
    Muhammad Aslam Uqaili
    Nishat Ahmad Khan
    Javier Poncela
    B. S. Chowdhry
    Wireless Personal Communications, 2015, 85 : 511 - 528
  • [33] Hardware Implementation of Reconfigurable 1D Convolution
    Rao, Lei
    Zhang, Bin
    Zhao, Jizhong
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (01): : 1 - 16
  • [34] Implementation of Medical Image Processing Algorithm on Reconfigurable Hardware
    Chiuchisan, Iuliana
    2013 E-HEALTH AND BIOENGINEERING CONFERENCE (EHB), 2013,
  • [35] Threefish-256 algorithm implementation on reconfigurable hardware
    Nieto-Ramirez, Nathaly
    Dario Nieto-Londono, Ruben
    REVISTA ITECKNE, 2014, 11 (02): : 149 - 156
  • [36] Reconfigurable hardware implementation of a phase-correlation stereoalgorithm
    Ahmad Darabiha
    W. James MacLean
    Jonathan Rose
    Machine Vision and Applications, 2006, 17 : 116 - 132
  • [37] Reconfigurable Hardware Implementation of a Multivariate Polynomial Interpolation Algorithm
    Arce-Nazario, Rafael A.
    Orozco, Edusmildo
    Bollman, Dorothy
    INTERNATIONAL JOURNAL OF RECONFIGURABLE COMPUTING, 2010, 2010
  • [38] Efficient Implementation of Knuth Yao Sampler on Reconfigurable Hardware
    Baidya, Paresh
    Paul, Rourab
    Mandal, Swagata
    Debnath, Sumit Kumar
    IEEE COMPUTER ARCHITECTURE LETTERS, 2024, 23 (02) : 195 - 198
  • [39] Reconfigurable hardware for efficient implementation of programmable FIR filters
    Denk, TC
    Nicol, CJ
    Larsson, P
    Azadet, K
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3005 - 3008
  • [40] An Efficient Implementation of Montgomery Powering Ladder in Reconfigurable Hardware
    Mesquita, Daniel
    Perin, Guilherme
    Herrmann, Fernando Luis
    Martins, Joao Baptista
    SBCCI 2010: 23RD SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN, PROCEEDINGS, 2010, : 121 - 126