Fast Arbitrary Precision Floating Point on FPGA

被引:0
|
作者
Licht, Johannes de Fine [1 ]
Pattison, Christopher A. [2 ]
Ziogas, Alexandros Nikolaos [1 ]
Simmons-Duffin, David [3 ]
Hoefler, Torsten [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] CALTECH, Inst Quantum Informat & Matter, Pasadena, CA 91125 USA
[3] CALTECH, Walter Burke Inst Theoret Phys, Pasadena, CA 91125 USA
基金
欧洲研究理事会;
关键词
MULTIPLICATION;
D O I
10.1109/FCCM53951.2022.9786219
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the superlinear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U250 accelerator to a dual-socket 36-core Xeon node running the GNU Multiple Precision Floating-Point Reliable (MPFR) library, we achieve a 9.8x speedup at 4.8 GOp/s for 512-bit multiplication, and a 5.3x speedup at 1.2 GOp/s for 1024-bit multiplication, corresponding to the throughput of more than 351x and 191x CPU cores, respectively. We apply this architecture to general matrix-matrix multiplication, yielding a 10x speedup at 2.0 GOp/s over the Xeon node, equivalent to more than 375x CPU cores, effectively allowing a single FPGA to replace a small CPU cluster. Due to the significant dependence of some numerical codes on APFP, such as semidefinite program solvers, we expect these gains to translate into real-world speedups. Our configurable and flexible HLS-based code provides as high-level software interface for plug-and-play acceleration, published as an open source project.
引用
收藏
页码:182 / 190
页数:9
相关论文
共 50 条
  • [41] FPGA Implementation of Vedic Floating Point Multiplier
    Kodali, Ravi Kishore
    Boppana, Lakshmi
    Yenamachintala, Sai Sourabh
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, INFORMATICS, COMMUNICATION AND ENERGY SYSTEMS (SPICES), 2015,
  • [42] Parameterisable floating-point operations on FPGA
    Lee, B
    Burgess, N
    THIRTY-SIXTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS - CONFERENCE RECORD, VOLS 1 AND 2, CONFERENCE RECORD, 2002, : 1064 - 1068
  • [43] FPGA Implementation of Hybrid Fixed Point - Floating Point Multiplication
    Amaricai, Alexandru
    Boncalo, Oana
    Sicoe, Ovidiu
    Marcu, Marius
    MIXED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, MIXDES 2013, 2013, : 243 - 246
  • [44] Accurate Floating-point Operation using Controlled Floating-point Precision
    Zaki, Ahmad M.
    Bahaa-Eldin, Ayman M.
    El-Shafey, Mohamed H.
    Aly, Gamal M.
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 696 - 701
  • [45] A Fast Single-Precision Floating-Point Multiplier Based on Karatsuba and Vedic Algorithms
    Yi Q.-M.
    Fu Q.-G.
    Shi M.
    Luo A.-W.
    Chen J.-W.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2021, 50 (03): : 368 - 374
  • [46] Fast floating point square root
    Hain, TF
    Mercer, DB
    AMCS '05: Proceedings of the 2005 International Conference on Algorithmic Mathematics and Computer Science, 2005, : 33 - 39
  • [47] Low-Cost High-Precision Architecture for Arbitrary Floating-Point Nth Root Computation
    Hong, Wanyuan
    Chen, Hui
    Quan, Lianghua
    Fu, Yuxiang
    Li, Li
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [48] Sinking Point: Dynamic Precision Tracking for Floating-Point
    Zorn, Bill
    Grossman, Dan
    Tatlock, Zach
    CONFERENCE FOR NEXT GENERATION ARITHMETIC 2019 (CONGA), 2019,
  • [49] An Asynchronous Double Precision Floating Point Multiplier
    Nair, Suma
    Sudarshan, T. S. B.
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [50] An accelerator for double precision floating point operations
    Danese, G
    De Lotto, I
    Leporati, F
    Scaricabarozzi, M
    Spelgatti, A
    ELEVENTH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PROCEEDINGS, 2003, : 57 - 63