Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication

被引:15
|
作者
Dai, Wangchen [1 ]
Chen, Donald Donglong [1 ]
Cheung, Ray C. C. [1 ]
Koc, Cetin Kaya [2 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
[2] Univ Calif Santa Barbara, Dept Comp Sci, Santa Barbara, CA 93106 USA
关键词
Montgomery modular multiplication; number-theoretic weighted transform; fast Fourier transform (FFT); field-programmable gate array (FPGA); MODULAR MULTIPLICATION; EXPONENTIATION; TRANSFORMS;
D O I
10.1109/TC.2016.2601334
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The modular multiplication operation is the most time-consuming operation for number-theoretic cryptographic algorithms involving large integers, such as RSA and Diffie-Hellman. Implementations reveal that more than 75 percent of the time is spent in the modular multiplication function within the RSA for more than 1,024-bit moduli. There are fast multiplier architectures to minimize the delay and increase the throughput using parallelism and pipelining. However such designs are large in terms of area and low in efficiency. In this paper, we integrate the fast Fourier transform (FFT) method into the McLaughlin's framework, and present an improved FFT-based Montgomery modular multiplication (MMM) algorithm achieving high area-time efficiency. Compared to the previous FFT-based designs, we inhibit the zero-padding operation by computing the modular multiplication steps directly using cyclic and nega-cyclic convolutions. Thus, we reduce the convolution length by half. Furthermore, supported by the number-theoretic weighted transform, the FFT algorithm is used to provide fast convolution computation. We also introduce a general method for efficient parameter selection for the proposed algorithm. Architectures with single and double butterfly structures are designed obtaining low area-latency solutions, which we implemented on Xilinx Virtex-6 FPGAs. The results show that our work offers a better area-latency efficiency compared to the state-of-the-art FFT-based MMM architectures from and above 1,024-bit operand sizes. We have obtained area-latency efficiency improvements up to 50.9 percent for 1,024-bit, 41.9 percent for 2,048-bit, 37.8 percent for 4,096-bit and 103.2 percent for 7,680-bit operands. Furthermore, the operating latency is also outperformed with high clock frequency for length-64 transform and above.
引用
收藏
页码:375 / 388
页数:14
相关论文
共 50 条
  • [1] Parameter Space for the Architecture of FFT-Based Montgomery Modular Multiplication
    Chen, Donald Donglong
    Yao, Gavin Xiaoxu
    Cheung, Ray C. C.
    Pao, Derek
    Koc, Cetin Kaya
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (01) : 147 - 160
  • [2] A Scalable Montgomery Modular Multiplication Architecture with Low Area-Time Product Based on Redundant Binary Representation
    Zhang, Zhaoji
    Zhang, Peiyong
    [J]. ELECTRONICS, 2022, 11 (22)
  • [3] An Iterative Montgomery Modular Multiplication Algorithm With Low Area-Time Product
    Zhang, Bo
    Cheng, Zeming
    Pedram, Massoud
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (01) : 236 - 249
  • [4] Area-Time Efficient Realization of Multiple Constant Multiplication
    Lou, Xin
    Yu, Ya Jun
    [J]. 2015 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2015, : 962 - 965
  • [5] Towards Area-Efficient Optical Neural Networks: An FFT-based Architecture
    Gu, Jiaqi
    Zhao, Zheng
    Feng, Chenghao
    Liu, Mingjie
    Chen, Ray T.
    Pan, David Z.
    [J]. 2020 25TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2020, 2020, : 476 - 481
  • [6] Area-time efficient systolic architecture for the DCT
    Meher, PK
    [J]. ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, PROCEEDINGS, 2005, 3740 : 787 - 794
  • [7] THE AREA-TIME COMPLEXITY OF BINARY MULTIPLICATION
    BRENT, RP
    KUNG, HT
    [J]. JOURNAL OF THE ACM, 1981, 28 (03) : 521 - 534
  • [8] Area-Time Efficient Hardware Architecture for Signature Based on Ed448
    Bisheh-Niasar, Mojtaba
    Azarderakhsh, Reza
    Kermani, Mehran Mozaffari
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (08) : 2942 - 2946
  • [9] Area-Time Efficient Hardware Architecture for CRYSTALS-Kyber
    Nguyen, Tuy Tan
    Kim, Sungjae
    Eom, Yongjun
    Lee, Hanho
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [10] Area-Time Efficient Streaming Architecture for FAST and BRIEF Detector
    Lam, Siew-Kei
    Jiang, Guiyuan
    Wu, Meiqing
    Cao, Bin
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2019, 66 (02) : 282 - 286