Mechanical derivation of fused multiply-add algorithms for linear transforms

被引:5
|
作者
Voronenko, Yevgen [1 ]
Pueschel, Markus [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
automatic program generation; discrete cosine transform (DCT); discrete Fourier transform (DFT); fast algorithm; implementation; multiply-and-accumulate (MAC); instruction; multiply and accumulate (MAC);
D O I
10.1109/TSP.2007.896116
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Several computer architectures offer fused multiply-add (FMA), also called multiply-and-accumulate (MAC) instructions, that are as fast as a single addition or multiplication. For the efficient implementation of linear transforms, such as the discrete Fourier transform or discrete cosine transforms, this poses a challenge to algorithm developers as standard transform algorithms have to be manipulated into FMA algorithms that make optimal use of FMA instructions. We present a general method to convert any transform algorithm into an FMA algorithm. The method works with both algorithms given as directed acyclic graphs (DAGs) and algorithms given as structured matrix factorizations. We prove bounds on the efficiency of the method. In particular, we show that it removes all single multiplications except at most as many as the transform has outputs. We implemented the DAG-based version of the method and show that we can generate many of the best-known hand-derived FMA, algorithms from the literature as well as a few novel FMA algorithms.
引用
收藏
页码:4458 / 4473
页数:16
相关论文
共 50 条
  • [41] Characterization of RNS Multiply-Add Units for Power Efficient DSP
    Cardarilli, Gian Carlo
    Nannarelli, Alberto
    Petricca, Massimo
    Re, Marco
    2015 IEEE 58TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2015,
  • [42] A Fully Parameterizable Low Power Design of Vector Fused Multiply-Add Using Active Clock-Gating Techniques
    Ratkovic, Ivan
    Palomar, Oscar
    Stanic, Milan
    Unsal, Osman
    Cristal, Adrian
    Valero, Mateo
    ISLPED '16: PROCEEDINGS OF THE 2016 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2016, : 362 - 367
  • [43] Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit
    Nievergelt, Y
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2003, 29 (01): : 27 - 48
  • [44] Design of Low-Cost High-performance Floating-point Fused Multiply-Add with Reduced Power
    Qi, Zichu
    Guo, Qi
    Zhang, Ge
    Li, Xiangku
    Hu, Weiwu
    23RD INTERNATIONAL CONFERENCE ON VLSI DESIGN, 2010, : 206 - 211
  • [45] Vector Processing-Aware Advanced Clock-Gating Techniques for Low-Power Fused Multiply-Add
    Ratkovic, Ivan
    Palomar, Oscar
    Stanic, Milan
    Unsal, Osman Sabri
    Cristal, Adrian
    Valero, Mateo
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (04) : 639 - 652
  • [46] A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating
    Zhang, Song
    Gu, Jiangyuan
    Yin, Shouyi
    Liu, Leibo
    Wei, Shaojun
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 229 - 234
  • [47] Residue Arithmetic for Designing Low-Power Multiply-Add Units
    Kouretas, Ioannis
    Paliouras, Vassilis
    INTEGRATED CIRCUIT AND SYSTEM DESIGN: POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION, 2011, 6448 : 31 - 40
  • [48] Enhanced Floating-Point Multiply-Add with Full Denormal Support
    Sohn, Jongwook
    Dean, David K.
    Quintana, Eric
    Wong, Wing Shek
    2023 IEEE 30TH SYMPOSIUM ON COMPUTER ARITHMETIC, ARITH 2023, 2023, : 143 - 150
  • [49] Residue Arithmetic for Variation-Tolerant Design of Multiply-Add Units
    Kouretas, Ioannis
    Paliouras, Vassilis
    INTEGRATED CIRCUIT AND SYSTEM DESIGN: POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION, 2010, 5953 : 26 - 35
  • [50] Low-precision DSP-based floating-point multiply-add fused for Field Programmable Gate Arrays
    Amaricai, Alexandru
    Boncalo, Oana
    Gavriliu, Constantina-Elena
    IET COMPUTERS AND DIGITAL TECHNIQUES, 2014, 8 (04): : 187 - 197