Accelerating RSA with Fine-Grained Parallelism Using GPU

被引:15
|
作者
Yang, Yang [1 ,2 ,3 ]
Guan, Zhi [1 ,2 ,3 ]
Sun, Huiping [2 ,3 ,4 ]
Chen, Zhong [1 ,2 ,3 ]
机构
[1] Peking Univ, Inst Software, Sch EECS, Beijing, Peoples R China
[2] MoE Key Lab High Confidence Software Technol PKU, Beijing, Peoples R China
[3] MoE Key Lab Network & Software Secur Assurance PK, Beijing, Peoples R China
[4] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
关键词
RSA; GPGPU; CUDA; Montgomery Multiplication; CRT;
D O I
10.1007/978-3-319-17533-1_31
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.
引用
收藏
页码:454 / 468
页数:15
相关论文
共 50 条
  • [1] Accelerating a Lossy Compression Method with Fine-Grained Parallelism on a GPU
    Wu, Yifan
    Shen, Jingcheng
    Okita, Masao
    Ino, Fumihiko
    [J]. PAAP 2021: 2021 12TH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING, 2021, : 76 - 81
  • [2] FINE-GRAINED PARALLELISM IN ELLIE
    ANDERSEN, B
    [J]. JOURNAL OF OBJECT-ORIENTED PROGRAMMING, 1992, 5 (03): : 55 - 61
  • [3] Fine-grained parallelism in computational mathematics
    Bandman, OL
    [J]. PROGRAMMING AND COMPUTER SOFTWARE, 2001, 27 (04) : 170 - 182
  • [4] Fine-Grained Parallelism in Computational Mathematics
    O. L. Bandman
    [J]. Programming and Computer Software, 2001, 27 : 170 - 182
  • [5] Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA
    Xia, Fei
    Jin, Guoqing
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (03)
  • [6] Evaluation of Fine-grained Parallelism in AUTOSAR Applications
    Stegmeier, Alexander
    Kehr, Sebastian
    George, Dave
    Bradatsch, Christian
    Panic, Milos
    Bodekker, Bert
    Ungerer, Theo
    [J]. INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS 2017), 2017, : 121 - 128
  • [7] Graph Analytics Through Fine-Grained Parallelism
    Shang, Zechao
    Li, Feifei
    Yu, Jeffrey Xu
    Zhang, Zhiwei
    Cheng, Hong
    [J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 463 - 478
  • [8] Exploiting Fine-Grained Parallelism on Cell Processors
    Hoffmann, Ralf
    Prell, Andreas
    Rauber, Thomas
    [J]. EURO-PAR 2010 - PARALLEL PROCESSING, PART II, 2010, 6272 : 175 - 186
  • [9] A MATCHING APPROACH TO UTILIZING FINE-GRAINED PARALLELISM
    GUPTA, R
    SOFFA, ML
    [J]. PROCEEDINGS OF THE TWENTY-FIRST, ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOLS 1-4: ARCHITECTURE TRACK, SOFTWARE TRACK, DECISION SUPPORT AND KNOWLEDGE BASED SYSTEMS TRACK, APPLICATIONS TRACK, 1988, : 148 - 156
  • [10] Implementation of Algorithms with a Fine-Grained Parallelism on GPUs
    Kalgin, K. V.
    [J]. NUMERICAL ANALYSIS AND APPLICATIONS, 2011, 4 (01) : 46 - 55