Faster Modular Exponentiation using Double Precision Floating Point Arithmetic on the GPU

被引:0
|
作者
Emmart, Niall [1 ]
Zheng, Fangyu [2 ,3 ]
Weems, Charles [1 ]
机构
[1] Univ Massachusetts, Coll Informat & Comp Sci, Amherst, MA 01003 USA
[2] Chinese Acad Sci, State Key Lab Informat Secur, Inst Informat Engn, Beijing, Peoples R China
[3] Chinese Acad Sci, Data Assurance & Commun Secur Res Ctr, Beijing, Peoples R China
基金
美国国家科学基金会; 国家重点研发计划;
关键词
MULTIPLICATION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents a new approach to integer multiple precision (MP) modular exponentiation, using double-precision floating point (DPF) operations, that is suitable for GPU implementation. We show speedups ranging from 20% to 34% over the best prior GPU times for sizes corresponding to common RSA cryptographic operations (2048 to 4096 bits). Three techniques are described. First, by adding 2(104) to the high half of the product, and 2(52) to the low half, we set the implicit leading 1 in the DPF mantissa so that the full 52 explicit bits are available for each half of the 104-bit products of samples. Second, the DPF values are cast bitwise to 64-bit integers for adding the column sums to get the MP result Normally the cast would require masking off the exponents, but because they are constant, we can include them in the column sums and correct just once for their total. Third, by initializing the column sums with the appropriate negative value to compensate for the exponent sums, no corrective subtraction is needed. Our implementation on an NVIDIA GTX Titan Black GPU achieves between 132.5K and 161.9K modular exponentiations per second of size 1024 bits, with latencies ranging from 21.7 ms to 17.8 ms, making it practical for online RSA applications. Proportional results are shown for 1536 and 2048 bits. The implementation is so efficient that its maximum sustained performance is actually bounded by the thermal limit of the GPU.
引用
收藏
页码:130 / 137
页数:8
相关论文
共 50 条
  • [1] Double precision floating-point arithmetic on FPGAs
    Paschalakis, S
    Lee, P
    2003 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2003, : 352 - 358
  • [2] Algorithms for quad-double precision floating point arithmetic
    Hida, Y
    Li, XS
    Bailey, DH
    ARITH-15 2001: 15TH SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2001, : 155 - 162
  • [3] ARBITRARY PRECISION FLOATING-POINT ARITHMETIC
    MOTTELER, FC
    DR DOBBS JOURNAL, 1993, 18 (09): : 28 - &
  • [4] Faster Gaussian Lattice Sampling Using Lazy Floating-Point Arithmetic
    Ducas, Leo
    Nguyen, Phong Q.
    ADVANCES IN CRYPTOLOGY - ASIACRYPT 2012, 2012, 7658 : 415 - 432
  • [5] Arithmetic Algorithms for Extended Precision Using Floating-Point Expansions
    Joldes, Mioara
    Marty, Olivier
    Muller, Jean-Michel
    Popescu, Valentina
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (04) : 1197 - 1210
  • [6] A Modular-Positional Computation Technique for Multiple-Precision Floating-Point Arithmetic
    Isupov, Konstantin
    Knyazkov, Vladimir
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2015), 2015, 9251 : 47 - 61
  • [7] Arithmetic operations beyond floating point number precision
    Wang, Chih-Yueh
    Yin, Chen-Yang
    Chen, Hong-Yu
    Chen, Yung-Ko
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2011, 6 (03) : 206 - 215
  • [8] SIMULATING LOW PRECISION FLOATING-POINT ARITHMETIC
    Higham, Nicholas J.
    Pranesh, Srikara
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2019, 41 (05): : C585 - C602
  • [9] A double precision floating point multiply
    Montoye, R
    Belluomini, W
    Ngo, H
    McDowell, C
    Sawada, J
    Nguyen, T
    Veraa, B
    Wagoner, J
    Lee, M
    2003 IEEE INTERNATIONAL SOLID-STATE CIRCUITS CONFERENCE: DIGEST OF TECHNICAL PAPERS, 2003, 46 : 336 - 337
  • [10] Efficient Realization of Table Look-up based Double Precision Floating Point Arithmetic
    Merchant, Farhad
    Choudhary, Nimash
    Nandy, S. K.
    Narayan, Ranjani
    2016 29TH INTERNATIONAL CONFERENCE ON VLSI DESIGN AND 2016 15TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS (VLSID), 2016, : 415 - 420