TCPM: A Reconfigurable and Efficient Toom-Cook-Based Polynomial Multiplier Over Rings Using a Novel Compressed Postprocessing Algorithm

被引:3
|
作者
Wang, Jianfei [1 ]
Yang, Chen [1 ]
Zhang, Fahong [1 ]
Meng, Yishuo [1 ]
Su, Yang [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Microelect, Xian 710049, Shaanxi, Peoples R China
[2] Engn Univ Peoples Armed Police, Sch Cryptog Engn, Xian 710086, Peoples R China
基金
中国国家自然科学基金;
关键词
Karatsuba; polynomial multiplier; polynomial over rings; ring learning with error (RLWE); Toom-Cook; HOMOMORPHIC ENCRYPTION; ACCELERATOR;
D O I
10.1109/TVLSI.2023.3277865
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Polynomial multiplication over rings is a significant bottleneck of ring learning with error (RLWE)-based encryption. To speed it up, three algorithms are widely used, i.e., number theoretic transform (NTT), Schoolbook, and Toom-Cook. Compared with Schoolbook and NTT, Toom-Cook can achieve a better trade-off between performance and flexibility. However, in ToomCook postprocessing, there are many redundant steps and calculations that have not been eliminated. Therefore, we propose an efficient, compressed, and fused Toom-Cook postprocessing algorithm that reduces the number of steps and at least 33.33% of the arithmetic operations of postprocessing. A highly reconfigurable and efficient Toom-Cook-based polynomial multiplier (TCPM) is proposed to speed up polynomial multiplication over rings. In TCPM, a high-throughput and efficient heterogeneous processing element (PE) array is designed to exploit the parallelism of Toom-Cook, and based on the compressed algorithm, the PE array for postprocessing is scaled down. In addition, as it is provided with a reconfigurable evaluation module, a flexible polynomial data storage module and a universal PE array, TCPM can efficiently map and execute Toom-Cook-2, 3, and 4 on a unified hardware architecture. Implemented on the Xilinx VC709 field-programmable gate array (FPGA) platform, TCPM can perform a Toom-Cook-4-based 256 x 256 polynomial multiplication over rings with a modulus of a power of two or a prime every 3.28 mu s at a 360-MHz clock frequency. It achieves a 2.47x to 50.11x speedup compared with the previous designs.
引用
收藏
页码:1153 / 1166
页数:14
相关论文
共 2 条
  • [1] A High-Throughput Toom-Cook-4 Polynomial Multiplier for Lattice-Based Cryptography Using a Novel Winograd-Schoolbook Algorithm
    Wang, Jianfei
    Yang, Chen
    Zhang, Fahong
    Meng, Yishuo
    Xiang, Siwei
    Su, Yang
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024, 71 (01) : 359 - 372
  • [2] Subquadratic Space Complexity Digit-Serial Multiplier over Binary Extension Fields using Toom-Cook Algorithm
    Lee, Chiou-Yng
    Meher, Pramod Kumar
    Lee, Wen-Yo
    [J]. 2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 176 - 179