Generating High-Performance Number Theoretic Transform Implementations for Vector Architectures

被引:0
|
作者
Zhang, Naifeng [1 ]
Ebel, Austin [2 ]
Neda, Negar [2 ]
Brinich, Patrick [3 ]
Reynwar, Benedict [4 ]
Schmidt, Andrew G. [4 ]
Franusich, Mike
Johnson, Jeremy [3 ]
Reagen, Brandon [2 ]
Franchetti, Franz [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] New York Univ, New York, NY USA
[3] Drexel Univ, Philadelphia, PA 19104 USA
[4] USC Informat Sci, Marina Del Rey, CA USA
关键词
Fully homomorphic encryption; number theoretic transform; SPIRAL; code generation; vectorization; FAST FOURIER-TRANSFORM;
D O I
10.1109/HPEC58863.2023.10363559
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Fully homomorphic encryption (FHE) offers the ability to perform computations directly on encrypted data by encoding numerical vectors onto mathematical structures. However, the adoption of FHE is hindered by substantial overheads that make it impractical for many applications. Number theoretic transforms (NTTs) are a key optimization technique for FHE by accelerating vector convolutions. Towards practical usage of FHE, we propose to use SPIRAL, a code generator renowned for generating efficient linear transform implementations, to generate high-performance NTT on vector architectures. We identify suitable NTT algorithms and translate the dataflow graphs of those algorithms into SPIRAL's internal mathematical representations. We then implement the entire workflow required for generating efficient vectorized NTT code. In this work, we target the Ring Processing Unit (RPU), a multi-tile long vector accelerator designed for FHE computations. On average, the SPIRAL-generated NTT kernel achieves a 1.7x speedup over naive implementations on RPU, showcasing the effectiveness of our approach towards maximizing performance for NTT computations on vector architectures.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Architectures for high-performance FPGA implementations of neural models
    Weinstein, Randall K.
    Lee, Robert H.
    [J]. JOURNAL OF NEURAL ENGINEERING, 2006, 3 (01) : 21 - 34
  • [2] High-Level Synthesis design approach for Number-Theoretic Transform Implementations
    El-Kady, Alexander
    Fournaris, Apostolos P.
    Tsakoulis, Thanasis
    Haleplidis, Evangelos
    Paliouras, Vassilis
    [J]. PROCEEDINGS OF THE 2021 IFIP/IEEE INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2021, : 196 - 201
  • [3] High-performance parallel implementations of flow accumulation algorithms for multicore architectures
    Kotyra, Bartlomiej
    Chabudzinski, Lukasz
    Stpiczynski, Przemyslaw
    [J]. COMPUTERS & GEOSCIENCES, 2021, 151
  • [4] Conceptual Review on Number Theoretic Transform and Comprehensive Review on Its Implementations
    Satriawan, Ardianto
    Syafalni, Infall
    Mareta, Rella
    Anshori, Isa
    Shalannanda, Wervyan
    Barra, Aleams
    [J]. IEEE ACCESS, 2023, 11 : 70288 - 70316
  • [5] GENERATING HIGH PERFORMANCE PRUNED FFT IMPLEMENTATIONS
    Franchetti, Franz
    Pueschel, Markus
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 549 - 552
  • [6] High-performance implementations of the descartes method
    Johnson, Jeremy R.
    Krandick, Werner
    Lynch, Kevin
    Richardson, David G.
    Ruslanov, Anatole D.
    [J]. Proc Int Symp Symbol Algebraic Comput ISSAC, 1600, (154-161):
  • [7] High Performance Integer Multiplier on FPGA with Radix-4 Number Theoretic Transform
    Chang, Boon-Chiao
    Lee, Wai-Kong
    Goi, Bok-Min
    Hwang, Seong Oun
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (08): : 2816 - 2830
  • [8] High-performance architectures and compilers
    O'Boyle, Michael
    Bodin, Francois
    Gonzalez, Jose
    Vintan, Lucian
    [J]. Euro-Par 2007 Parallel Processing, Proceedings, 2007, 4641 : 235 - 235
  • [9] HIGH-PERFORMANCE PARALLEL ARCHITECTURES
    ANDERSON, RE
    [J]. PROCEEDINGS : SUPERCOMPUTING 89, 1989, : 410 - 415
  • [10] HIGH-PERFORMANCE MICROPROCESSOR ARCHITECTURES
    PARASURAMAN, B
    [J]. PROCEEDINGS OF THE IEEE, 1976, 64 (06) : 851 - 859