Accelerating RSA with Fine-Grained Parallelism Using GPU

被引：15

作者：

Yang, Yang ^{[1
,2
,3
]}

Guan, Zhi ^{[1
,2
,3
]}

Sun, Huiping ^{[2
,3
,4
]}

Chen, Zhong ^{[1
,2
,3
]}

机构：

[1] Peking Univ, Inst Software, Sch EECS, Beijing, Peoples R China

[2] MoE Key Lab High Confidence Software Technol PKU, Beijing, Peoples R China

[3] MoE Key Lab Network & Software Secur Assurance PK, Beijing, Peoples R China

[4] Peking Univ, Sch Software & Microelect, Beijing, Peoples R China

来源：

INFORMATION SECURITY PRACTICE AND EXPERIENCE, ISPEC 2015 | 2015年 / 9065卷

关键词：

RSA; GPGPU; CUDA; Montgomery Multiplication; CRT;

D O I：

10.1007/978-3-319-17533-1_31

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.

引用

页码：454 / 468

页数：15

共 50 条

[1] Accelerating a Lossy Compression Method with Fine-Grained Parallelism on a GPU
Wu, Yifan
Shen, Jingcheng
Okita, Masao
Ino, Fumihiko
[J]. PAAP 2021: 2021 12TH INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING, 2021, : 76 - 81
[2] FINE-GRAINED PARALLELISM IN ELLIE
ANDERSEN, B
[J]. JOURNAL OF OBJECT-ORIENTED PROGRAMMING, 1992, 5 (03): : 55 - 61
[3] Fine-grained parallelism in computational mathematics
Bandman, OL
[J]. PROGRAMMING AND COMPUTER SOFTWARE, 2001, 27 (04) : 170 - 182
[4] Fine-Grained Parallelism in Computational Mathematics
O. L. Bandman
[J]. Programming and Computer Software, 2001, 27 : 170 - 182
[5] Fine-grained parallelism accelerating for RNA secondary structure prediction with pseudoknots based on FPGA
Xia, Fei
Jin, Guoqing
[J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2014, 12 (03)
[6] Evaluation of Fine-grained Parallelism in AUTOSAR Applications
Stegmeier, Alexander
Kehr, Sebastian
George, Dave
Bradatsch, Christian
Panic, Milos
Bodekker, Bert
Ungerer, Theo
[J]. INTERNATIONAL CONFERENCE ON EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION (SAMOS 2017), 2017, : 121 - 128
[7] Graph Analytics Through Fine-Grained Parallelism
Shang, Zechao
Li, Feifei
Yu, Jeffrey Xu
Zhang, Zhiwei
Cheng, Hong
[J]. SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 463 - 478
[8] Exploiting Fine-Grained Parallelism on Cell Processors
Hoffmann, Ralf
Prell, Andreas
Rauber, Thomas
[J]. EURO-PAR 2010 - PARALLEL PROCESSING, PART II, 2010, 6272 : 175 - 186
[9] A MATCHING APPROACH TO UTILIZING FINE-GRAINED PARALLELISM
GUPTA, R
SOFFA, ML
[J]. PROCEEDINGS OF THE TWENTY-FIRST, ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, VOLS 1-4: ARCHITECTURE TRACK, SOFTWARE TRACK, DECISION SUPPORT AND KNOWLEDGE BASED SYSTEMS TRACK, APPLICATIONS TRACK, 1988, : 148 - 156
[10] Implementation of Algorithms with a Fine-Grained Parallelism on GPUs
Kalgin, K. V.
[J]. NUMERICAL ANALYSIS AND APPLICATIONS, 2011, 4 (01) : 46 - 55

← 1 2 3 4 5 →