Fast Arbitrary Precision Floating Point on FPGA

被引：0

作者：

Licht, Johannes de Fine ^{[1
]}

Pattison, Christopher A. ^{[2
]}

Ziogas, Alexandros Nikolaos ^{[1
]}

Simmons-Duffin, David ^{[3
]}

Hoefler, Torsten ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

[2] CALTECH, Inst Quantum Informat & Matter, Pasadena, CA 91125 USA

[3] CALTECH, Walter Burke Inst Theoret Phys, Pasadena, CA 91125 USA

来源：

2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022) | 2022年

基金：

欧洲研究理事会;

关键词：

MULTIPLICATION;

D O I：

10.1109/FCCM53951.2022.9786219

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the superlinear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U250 accelerator to a dual-socket 36-core Xeon node running the GNU Multiple Precision Floating-Point Reliable (MPFR) library, we achieve a 9.8x speedup at 4.8 GOp/s for 512-bit multiplication, and a 5.3x speedup at 1.2 GOp/s for 1024-bit multiplication, corresponding to the throughput of more than 351x and 191x CPU cores, respectively. We apply this architecture to general matrix-matrix multiplication, yielding a 10x speedup at 2.0 GOp/s over the Xeon node, equivalent to more than 375x CPU cores, effectively allowing a single FPGA to replace a small CPU cluster. Due to the significant dependence of some numerical codes on APFP, such as semidefinite program solvers, we expect these gains to translate into real-world speedups. Our configurable and flexible HLS-based code provides as high-level software interface for plug-and-play acceleration, published as an open source project.

引用

页码：182 / 190

页数：9

共 50 条

[31] Design and Implementation for Quadruple Precision Floating-point Multiplier Based on FPGA with Lower Resource Occupancy
Kang Lei
Yan Xiao-ying
2014 Fifth International Conference on Intelligent Systems Design and Engineering Applications (ISDEA), 2014, : 326 - 329
[32] Design and Synthesis of Single Precision Floating Point Division based on Newton-Raphson Algorithm on FPGA
Singh, Naginder
Sasamal, Trailokya Nath
4TH INTERNATIONAL CONFERENCE ON ADVANCEMENTS IN ENGINEERING & TECHNOLOGY (ICAET-2016), 2016, 57
[33] FPGA-Based Training of Convolutional Neural Networks With a Reduced Precision Floating-Point Library
DiCecco, Roberto
Sun, Lin
Chow, Paul
2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 239 - 242
[34] LDPC decoder with a limited-precision FPGA-based floating-point multiplication coprocessor
Moberly, Raymond
O'Sullivana, Michael
Waheed, Khurram
ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XVII, 2007, 6697
[35] Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor
Zhou, Jie
Dou, Yong
Lei, Yuanwu
Xu, Jinbo
Dong, Yazhuo
HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 182 - 189
[36] Design and Implementation of Differential Evolution Algorithm on FPGA for Double-Precision Floating-Point Representation
Cortes-Antonio, Prometeo
Rangel-Gonzalez, Josue
Villa-Vargas, Luis A.
Antonio Ramirez-Salinas, Marco
Molina-Lozano, Heron
Batyrshin, Ildar
ACTA POLYTECHNICA HUNGARICA, 2014, 11 (04) : 139 - 153
[37] Floating Point FPGA Architecture of PID Controller
Wadgaonkar, Jagannath
Bhole, Kalyani
Singh, Prateek
2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INSTRUMENTATION AND CONTROL (ICIC), 2015, : 1259 - 1263
[38] Floating-point matrix product on FPGA
Bensaali, Faycal
Amira, Abbes
Sotudeh, Reza
2007 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2007, : 466 - +
[39] Floating-Point FPGA: Architecture and Modeling
Ho, Chun Hok
Yu, Chi Wai
Leong, Philip
Luk, Wayne
Wilton, Steven J. E.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2009, 17 (12) : 1709 - 1718
[40] An Efficient FPGA Implementation Of Floating Point Addition
Pesic, Djordje
Ratkovic, Ivan
2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 685 - 688

← 1 2 3 4 5 →