Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units

被引:5
|
作者
An, SangWoo [1 ]
Kim, YoungBeom [2 ]
Kwon, Hyeokdong [3 ]
Seo, Hwajeong [3 ]
Seo, Seog Chung [2 ]
机构
[1] Kookmin Univ, Dept Financial Informat Secur, Seoul 02707, South Korea
[2] Kookmin Univ, Dept Informat Secur Cryptol & Math, Seoul 02707, South Korea
[3] Hansung Univ, Div IT Convergence Engn, Seoul 02876, South Korea
基金
新加坡国家研究基金会;
关键词
CHAM; LEA; HIGHT; Graphic Processing Unit (GPU); CUDA; Counter (CTR) mode; parallel processing;
D O I
10.3390/math8111894
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
With the development of information and communication technology, various types of Internet of Things (IoT) devices have widely been used for convenient services. Many users with their IoT devices request various services to servers. Thus, the amount of users' personal information that servers need to protect has dramatically increased. To quickly and safely protect users' personal information, it is necessary to optimize the speed of the encryption process. Since it is difficult to provide the basic services of the server while encrypting a large amount of data in the existing CPU, several parallel optimization methods using Graphics Processing Units (GPUs) have been considered. In this paper, we propose several optimization techniques using GPU for efficient implementation of lightweight block cipher algorithms on the server-side. As the target algorithm, we select high security and light weight (HIGHT), Lightweight Encryption Algorithm (LEA), and revised CHAM, which are Add-Rotate-Xor (ARX)-based block ciphers, because they are used widely on IoT devices. We utilize the features of the counter (CTR) operation mode to reduce unnecessary memory copying and operations in the GPU environment. Besides, we optimize the memory usage by making full use of GPU's on-chip memory such as registers and shared memory and implement the core function of each target algorithm with inline PTX assembly codes for maximizing the performance. With the application of our optimization methods and handcrafted PTX codes, we achieve excellent encryption throughput of 468, 2593, and 3063 Gbps for HIGHT, LEA, and revised CHAM on RTX 2070 NVIDIA GPU, respectively. In addition, we present optimized implementations of Counter Mode Based Deterministic Random Bit Generator (CTR_DRBG), which is one of the widely used deterministic random bit generators to provide a large amount of random data to the connected IoT devices. We apply several optimization techniques for maximizing the performance of CTR_DRBG, and we achieve 52.2, 24.8, and 34.2 times of performance improvement compared with CTR_DRBG implementation on CPU-side when HIGHT-64/128, LEA-128/128, and CHAM-128/128 are used as underlying block cipher algorithm of CTR_DRBG, respectively.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 50 条
  • [21] Parallel frequent itemsets mining using distributed graphic processing units
    Ali Abbas Zoraghchian
    Mohammad Karim Sohrabi
    Farzin Yaghmaee
    [J]. Multimedia Tools and Applications, 2022, 81 : 43873 - 43895
  • [22] Parallel frequent itemsets mining using distributed graphic processing units
    Zoraghchian, Ali Abbas
    Sohrabi, Mohammad Karim
    Yaghmaee, Farzin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (30) : 43873 - 43895
  • [23] A PARALLEL AUXILIARY GRID ALGEBRAIC MULTIGRID METHOD FOR GRAPHIC PROCESSING UNITS
    Wang, Lu
    Hu, Xiaozhe
    Cohen, Jonathan
    Xu, Jinchao
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2013, 35 (03): : C263 - C283
  • [24] Parallel Genome-Wide Analysis With Central And Graphic Processing Units
    Kacamarga, Muhamad Fitra
    Baurley, James W.
    Pardamean, Bens
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2015, : 265 - 269
  • [25] Efficient Parallel Implementations of LWE-Based Post-Quantum Cryptosystems on Graphics Processing Units
    An, SangWoo
    Seo, Seog Chung
    [J]. MATHEMATICS, 2020, 8 (10) : 1 - 21
  • [26] Accelerating Biomedical Signal Processing Algorithms with Parallel Programming on Graphic Processor Units
    Konstantinidis, Evdokimos I.
    Frantzidis, Christos A.
    Tzimkas, Lazaros
    Pappas, Costas
    Bamidis, Panagiotis D.
    [J]. 2009 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS IN BIOMEDICINE, 2009, : 209 - 212
  • [27] Parallel Implementation of Spatial-Spectral Endmember Extraction on Graphic Processing Units
    Ignacio Jimenez, Luis
    Sanchez, Sergio
    Martin, Gabriel
    Plaza, Javier
    Plaza, Antonio J.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (04) : 1247 - 1255
  • [28] A Fine-Grained Parallel EMTP Algorithm Compatible to Graphic Processing Units
    Song, Yankan
    Chen, Ying
    Yu, Zhitong
    Huang, Shaowei
    Chen, Laijun
    [J]. 2014 IEEE PES GENERAL MEETING - CONFERENCE & EXPOSITION, 2014,
  • [29] High-speed implementations of block cipher ARIA using graphics processing units
    Yeom, Yongjin
    Cho, Yongkuk
    Yung, Moti
    [J]. MUE: 2008 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND UBIQUITOUS ENGINEERING, PROCEEDINGS, 2008, : 271 - 275
  • [30] Unleashing the Graphic Processing Units-Based Version of NAMD
    Gonzalez, Yamandu
    Ezzatti, Pablo
    Paulino, Margot
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2016), 2016, 9656 : 639 - 650