Fast Arbitrary Precision Floating Point on FPGA

被引:0
|
作者
Licht, Johannes de Fine [1 ]
Pattison, Christopher A. [2 ]
Ziogas, Alexandros Nikolaos [1 ]
Simmons-Duffin, David [3 ]
Hoefler, Torsten [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] CALTECH, Inst Quantum Informat & Matter, Pasadena, CA 91125 USA
[3] CALTECH, Walter Burke Inst Theoret Phys, Pasadena, CA 91125 USA
基金
欧洲研究理事会;
关键词
MULTIPLICATION;
D O I
10.1109/FCCM53951.2022.9786219
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the superlinear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U250 accelerator to a dual-socket 36-core Xeon node running the GNU Multiple Precision Floating-Point Reliable (MPFR) library, we achieve a 9.8x speedup at 4.8 GOp/s for 512-bit multiplication, and a 5.3x speedup at 1.2 GOp/s for 1024-bit multiplication, corresponding to the throughput of more than 351x and 191x CPU cores, respectively. We apply this architecture to general matrix-matrix multiplication, yielding a 10x speedup at 2.0 GOp/s over the Xeon node, equivalent to more than 375x CPU cores, effectively allowing a single FPGA to replace a small CPU cluster. Due to the significant dependence of some numerical codes on APFP, such as semidefinite program solvers, we expect these gains to translate into real-world speedups. Our configurable and flexible HLS-based code provides as high-level software interface for plug-and-play acceleration, published as an open source project.
引用
收藏
页码:182 / 190
页数:9
相关论文
共 50 条
  • [1] ARBITRARY PRECISION FLOATING-POINT ARITHMETIC
    MOTTELER, FC
    DR DOBBS JOURNAL, 1993, 18 (09): : 28 - &
  • [2] FPGA-Specific Custom VLIW Architecture for Arbitrary Precision Floating-Point Arithmetic
    Lei, Yuanwu
    Dou, Yong
    Zhou, Jie
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2011, E94D (11): : 2173 - 2183
  • [3] Fast HUB Floating-Point Adder for FPGA
    Villalba, Julio
    Hormigo, Javier
    Gonzalez-Navarro, Sonia
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (06) : 1028 - 1032
  • [4] A compression method for arbitrary precision floating-point images
    Manders, Corey
    Farbiz, Farzam
    Mann, Steve
    2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 1861 - +
  • [5] Efficient Implementation Of Single Precision Floating Point Processor In FPGA
    Lasith, K. K.
    Thomas, Anoop
    2014 ANNUAL INTERNATIONAL CONFERENCE ON EMERGING RESEARCH AREAS: MAGNETICS, MACHINES AND DRIVES (AICERA/ICMMD), 2014,
  • [6] Correctly Rounded Arbitrary-Precision Floating-Point Summation
    Lefevre, Vincent
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (12) : 2111 - 2124
  • [7] Correctly Rounded Arbitrary-Precision Floating-Point Summation
    Lefevre, Vincent
    2016 IEEE 23ND SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2016, : 71 - 78
  • [8] Real Time Simulation in Floating Point Precision Using FPGA Computing
    Apopei, Beniamin
    Mills, Andy
    Dodd, Tony
    Thompson, Haydn
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2009, 5453 : 349 - 354
  • [9] FPGA Implementation of Delay Optimized Single Precision Floating point Multiplier
    Paldurai, K.
    Hariharan, K.
    ICACCS 2015 PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION SYSTEMS, 2015,
  • [10] FPGA Based Implementation of a Double Precision IEEE Floating-Point Adder
    Ghosh, Somsubhra
    Bhattacharyya, Prarthana
    Dutta, Arka
    7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2013), 2013, : 271 - 275