Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions

被引:15
|
作者
Joldes, Mioara [1 ]
Marty, Olivier [2 ]
Muller, Jean-Michel [3 ]
Popescu, Valentina [4 ]
机构
[1] CNRS, LAAS Lab, 7 Ave Colonel Roche, F-31077 Toulouse, France
[2] ENS Cahan, 61 Ave President Wilson, F-94230 Cachan, France
[3] Ecole Normale Super Lyon, CNRS, LIP Lab, 46 Allee Italie, F-69364 Lyon 07, France
[4] Ecole Normale Super Lyon, LIP Lab, 46 Allee Italie, F-69364 Lyon 07, France
关键词
Floating-point arithmetic; floating-point expansions; high precision arithmetic; multiple-precision arithmetic; division; reciprocal; square root; Newton-Raphson iteration;
D O I
10.1109/TC.2015.2441714
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many numerical problems require a higher computing precision than the one offered by standard floating-point (FP) formats. One common way of extending the precision is to represent numbers in a multiple component format. By using the so-called floating-point expansions, real numbers are represented as the unevaluated sum of standard machine precision FP numbers. This representation offers the simplicity of using directly available, hardware implemented and highly optimized, FP operations. It is used by multiple-precision libraries such as Bailey's QD or the analogue Graphics Processing Units (GPU) tuned version, GQD. In this article we briefly revisit algorithms for adding and multiplying FP expansions, then we introduce and prove new algorithms for normalizing, dividing and square rooting of FP expansions. The new method used for computing the reciprocal a(-1) and the square root root a of a FP expansion a is based on an adapted Newton-Raphson iteration where the intermediate calculations are done using "truncated" operations (additions, multiplications) involving FP expansions. We give here a thorough error analysis showing that it allows very accurate computations. More precisely, after q iterations, the computed FP expansion x = x(0) + ... + x(2q-1) satisfies, for the reciprocal algorithm, the relative error bound: vertical bar(x - a(-1))/a(-1)vertical bar <= 2(-2q(p-3)-1) and, respectively, for the square root one: vertical bar x - 1/root a vertical bar <= 2(-2q(p-3)-1)/root a, where p > 2 is the precision of the FP representation used (p = 24 for single precision and p = 53 for double precision).
引用
收藏
页码:1197 / 1210
页数:14
相关论文
共 50 条
  • [1] A new multiplication algorithm for extended precision using floating-point expansions
    Muller, Jean-Michel
    Popescu, Valentina
    Tang, Ping Tak Peter
    [J]. 2016 IEEE 23ND SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2016, : 39 - 46
  • [2] ARBITRARY PRECISION FLOATING-POINT ARITHMETIC
    MOTTELER, FC
    [J]. DR DOBBS JOURNAL, 1993, 18 (09): : 28 - &
  • [3] Parallel floating-point expansions for extended-precision GPU computations
    Collange, Sylvain
    Joldes, Mioara
    Muller, Jean-Michel
    Popescu, Valentina
    [J]. 2016 IEEE 27TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP), 2016, : 139 - 146
  • [4] Double precision floating-point arithmetic on FPGAs
    Paschalakis, S
    Lee, P
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2003, : 352 - 358
  • [5] SIMULATING LOW PRECISION FLOATING-POINT ARITHMETIC
    Higham, Nicholas J.
    Pranesh, Srikara
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2019, 41 (05): : C585 - C602
  • [6] ALGORITHMS TO REVEAL PROPERTIES OF FLOATING-POINT ARITHMETIC
    MALCOLM, MA
    [J]. COMMUNICATIONS OF THE ACM, 1972, 15 (11) : 949 - &
  • [7] Algorithms for Manipulating Quaternions in Floating-Point Arithmetic
    Joldes, Mioara
    Muller, Jean-Michel
    [J]. 2020 IEEE 27TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2020, : 48 - 55
  • [8] Multiple precision floating-point arithmetic on SIMD processors
    van der Hoeven, Joris
    [J]. 2017 IEEE 24TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2017, : 2 - 9
  • [9] Floating-point arithmetic
    Boldo, Sylvie
    Jeannerod, Claude-Pierre
    Melquiond, Guillaume
    Muller, Jean-Michel
    [J]. ACTA NUMERICA, 2023, 32 : 203 - 290
  • [10] Accurate Floating-point Operation using Controlled Floating-point Precision
    Zaki, Ahmad M.
    Bahaa-Eldin, Ayman M.
    El-Shafey, Mohamed H.
    Aly, Gamal M.
    [J]. 2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 696 - 701