Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication

被引:15
|
作者
Dai, Wangchen [1 ]
Chen, Donald Donglong [1 ]
Cheung, Ray C. C. [1 ]
Koc, Cetin Kaya [2 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
[2] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
Montgomery modular multiplication; number-theoretic weighted transform; fast Fourier transform (FFT); field-programmable gate array (FPGA); MODULAR MULTIPLICATION; EXPONENTIATION; TRANSFORMS;
D O I
10.1109/TC.2016.2601334
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The modular multiplication operation is the most time-consuming operation for number-theoretic cryptographic algorithms involving large integers, such as RSA and Diffie-Hellman. Implementations reveal that more than 75 percent of the time is spent in the modular multiplication function within the RSA for more than 1,024-bit moduli. There are fast multiplier architectures to minimize the delay and increase the throughput using parallelism and pipelining. However such designs are large in terms of area and low in efficiency. In this paper, we integrate the fast Fourier transform (FFT) method into the McLaughlin's framework, and present an improved FFT-based Montgomery modular multiplication (MMM) algorithm achieving high area-time efficiency. Compared to the previous FFT-based designs, we inhibit the zero-padding operation by computing the modular multiplication steps directly using cyclic and nega-cyclic convolutions. Thus, we reduce the convolution length by half. Furthermore, supported by the number-theoretic weighted transform, the FFT algorithm is used to provide fast convolution computation. We also introduce a general method for efficient parameter selection for the proposed algorithm. Architectures with single and double butterfly structures are designed obtaining low area-latency solutions, which we implemented on Xilinx Virtex-6 FPGAs. The results show that our work offers a better area-latency efficiency compared to the state-of-the-art FFT-based MMM architectures from and above 1,024-bit operand sizes. We have obtained area-latency efficiency improvements up to 50.9 percent for 1,024-bit, 41.9 percent for 2,048-bit, 37.8 percent for 4,096-bit and 103.2 percent for 7,680-bit operands. Furthermore, the operating latency is also outperformed with high clock frequency for length-64 transform and above.
引用
收藏
页码:375 / 388
页数:14
相关论文
共 50 条
  • [31] An Efficient Radix-4 Scalable Architecture for Montgomery Modular Multiplication
    Kuang, Shiann-Rong
    Liang, Chih-Yuan
    Chen, Chun-Chi
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2016, 63 (06) : 568 - 572
  • [32] Area-efficient memory-based architecture for FFT processing
    Moon, SC
    Park, IC
    [J]. PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL V: BIO-MEDICAL CIRCUITS & SYSTEMS, VLSI SYSTEMS & APPLICATIONS, NEURAL NETWORKS & SYSTEMS, 2003, : 101 - 104
  • [33] Fast and area-time efficient Berger code checkers
    Guo, YY
    Lo, JC
    Metra, C
    [J]. 1997 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI SYSTEMS, PROCEEDINGS, 1997, : 110 - 118
  • [34] Area-time efficient serial-serial multipliers
    Aggoun, A
    Ashur, A
    Ibrahim, MK
    [J]. ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 585 - 588
  • [35] Architecture-Aware Technique for Mapping Area-Time Efficient Custom Instructions onto FPGAs
    Lam, Siew-Kei
    Srikanthan, Thambipillai
    Clarke, Christopher T.
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2011, 60 (05) : 680 - 692
  • [36] Analysis of area-time efficiency for an integrated focal plane architecture
    Robinson, WH
    Wills, DS
    [J]. IMAGE AND VIDEO COMMUNICATIONS AND PROCESSING 2003, PTS 1 AND 2, 2003, 5022 : 272 - 283
  • [37] Time-Efficient Computation of Digit Serial Montgomery Multiplication
    Dai, Wangchen
    Wu, Huapeng
    Cheung, Ray C. C.
    [J]. 2014 14TH INTERNATIONAL SYMPOSIUM ON INTEGRATED CIRCUITS (ISIC), 2014, : 212 - 215
  • [38] Design of Montgomery multiplication architecture based on programmable cellular automata
    Jeon, JC
    Park, HY
    Yoo, KY
    [J]. CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS, 2003, : 1676 - 1679
  • [39] A scalable architecture for modular multiplication based on Montgomery's algorithm
    Tenca, AF
    Koç, ÇK
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2003, 52 (09) : 1215 - 1221
  • [40] Design of Montgomery multiplication architecture based on programmable cellular automata
    Jeon, JC
    Yoo, KY
    [J]. COMPUTATIONAL INTELLIGENCE, 2004, 20 (03) : 495 - 502