ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

被引:0
|
作者
Zhou T. [1 ]
Zheng F. [2 ]
Fan G. [3 ]
Wan L. [2 ]
Tang W. [1 ]
Song Y. [3 ]
Bian Y. [4 ]
Lin J. [1 ,5 ]
机构
[1] School of Cyber Security, University of Science and Technology of China, Heifei
[2] School of CryptologyUniversity of Chinese Academy of Sciences, Beijing
[3] Ant Group, Hangzhou
[4] School of Computer Science and TechnologyUniversity of Chinese Academy of Sciences, Beijing
[5] Beijing Research Institute, University of Science and Technology of China, Beijing
基金
中国国家自然科学基金;
关键词
GPUs; Kyber; Lattice-based Cryptography; Tensor Core;
D O I
10.46586/tches.v2024.i2.25-63
中图分类号
学科分类号
摘要
The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized. In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput. © 2024, Ruhr-University of Bochum. All rights reserved.
引用
收藏
页码:25 / 63
页数:38
相关论文
共 26 条
  • [1] A novel method of Newton iteration-based interval analysis for multidisciplinary systems
    Lei Wang
    Chuang Xiong
    RuiXing Wang
    XiaoJun Wang
    Di Wu
    Science China(Physics,Mechanics & Astronomy), 2017, Mechanics & Astronomy)2017 (09) : 52 - 67
  • [2] A novel method of Newton iteration-based interval analysis for multidisciplinary systems
    Wang, Lei
    Xiong, Chuang
    Wang, RuiXing
    Wang, XiaoJun
    Wu, Di
    SCIENCE CHINA-PHYSICS MECHANICS & ASTRONOMY, 2017, 60 (09)
  • [3] A novel method of Newton iteration-based interval analysis for multidisciplinary systems
    Lei Wang
    Chuang Xiong
    RuiXing Wang
    XiaoJun Wang
    Di Wu
    Science China Physics, Mechanics & Astronomy, 2017, 60
  • [4] POWER ITERATION-BASED DISTRIBUTED TOTAL LEAST SQUARES ESTIMATION IN AD HOC SENSOR NETWORKS
    Bertrand, Alexander
    Moonen, Marc
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2669 - 2672
  • [5] A novel iteration-based controller for hybrid machine systems for trajectory tracking at the end-effector level
    Chen, Z. H.
    Wang, Y.
    Ouyang, P.
    Huang, J.
    Zhang, W. J.
    ROBOTICA, 2011, 29 : 317 - 324
  • [6] A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems
    曲延华
    王安娜
    林盛
    Chinese Physics B, 2018, 27 (01) : 232 - 239
  • [7] A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems
    Qu, Yan-Hua
    Wang, An-Na
    Lin, Sheng
    CHINESE PHYSICS B, 2018, 27 (01)
  • [8] A Faster Resonance Mode Analysis Approach Based on a Modified Shifted-Inverse Power Iteration Method
    Cartiel, Oriol
    Mesas, Juan Jose
    Sainz, Luis
    Fabregas, Andreu
    IEEE TRANSACTIONS ON POWER DELIVERY, 2023, 38 (06) : 4145 - 4156
  • [9] A Novel Power Flow Algorithm for AC Microgrids Based on Time Domain Iteration
    Zhu, Yixin
    Wang, Tao
    Xiong, Liansong
    Yang, Ping
    Xu, ZhiRong
    IECON 2017 - 43RD ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, 2017, : 2356 - 2362
  • [10] Novel approaches toward the development of Hall sensor-based magnetometric devices for charged particle accelerators
    Bolshakova, I
    Holyaka, R
    Leroy, C
    IEEE TRANSACTIONS ON APPLIED SUPERCONDUCTIVITY, 2002, 12 (01) : 1655 - 1658