Accelerating RSA with Fine-Grained Parallelism Using GPU

被引:15
|
作者
Yang, Yang [1 ,2 ,3 ]
Guan, Zhi [1 ,2 ,3 ]
Sun, Huiping [2 ,3 ,4 ]
Chen, Zhong [1 ,2 ,3 ]
机构
[1] Peking Univ, Inst Software, Sch EECS, Beijing, Peoples R China
[2] MoE Key Lab High Confidence Software Technol PKU, Beijing, Peoples R China
[3] MoE Key Lab Network & Software Secur Assurance PK, Beijing, Peoples R China
[4] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China
关键词
RSA; GPGPU; CUDA; Montgomery Multiplication; CRT;
D O I
10.1007/978-3-319-17533-1_31
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.
引用
收藏
页码:454 / 468
页数:15
相关论文
共 50 条
  • [31] Carbon: Architectural Support for Fine-Grained Parallelism on Chip Multiprocessors
    Kumar, Sanjeev
    Hughes, Christopher J.
    Nguyen, Anthony
    [J]. ISCA'07: 34TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, CONFERENCE PROCEEDINGS, 2007, : 162 - 173
  • [32] Testing fine-grained parallelism for the ADMM on a factor-graph
    Hao, Ning
    Oghbaee, AmirReza
    Rostami, Mohammad
    Derbinsky, Nate
    Bento, Jose
    [J]. 2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 835 - 844
  • [33] Fine-Grained Crowdsourcing for Fine-Grained Recognition
    Jia Deng
    Krause, Jonathan
    Li Fei-Fei
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 580 - 587
  • [34] BALANCING FINE-GRAINED AND MEDIUM-GRAINED PARALLELISM IN SCHEDULING LOOPS FOR THE XIMD ARCHITECTURE
    NEWBURN, CJ
    HUANG, AS
    SHEN, JP
    [J]. IFIP TRANSACTIONS A-COMPUTER SCIENCE AND TECHNOLOGY, 1993, 23 : 39 - 52
  • [35] Hierarchical Bucket Queuing for Fine-Grained Priority Scheduling on the GPU
    Kerbl, Bernhard
    Kenzel, Michael
    Schmalstieg, Dieter
    Seidel, Hans-Peter
    Steinberger, Markus
    [J]. COMPUTER GRAPHICS FORUM, 2017, 36 (08) : 232 - 246
  • [36] Fine-grained GPU parallelization of Pairwise Local Sequence Alignment
    Jain, Chirag
    Kumar, Subodh
    [J]. 2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [37] Efficient Sharing and Fine-Grained Scheduling of Virtualized GPU Resources
    Zhao, Xiaohui
    Yao, Jianguo
    Gao, Ping
    Guan, Haibing
    [J]. 2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 742 - 752
  • [38] GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud
    Tan, Xiaodan Serina
    Golikov, Pavel
    Vijaykumar, Nandita
    Pekhimenko, Gennady
    [J]. PROCEEDINGS OF THE 2022 31ST INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PACT 2022, 2022, : 317 - 332
  • [39] cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU
    Zhang, Jing
    Wang, Hao
    Lin, Heshan
    Feng, Wu-Chun
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [40] Fine-Grained Parallelization of a Vlasov-Poisson Application on GPU
    Latu, Guillaume
    [J]. EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 127 - 135