Mechanical derivation of fused multiply-add algorithms for linear transforms

被引:5
|
作者
Voronenko, Yevgen [1 ]
Pueschel, Markus [1 ]
机构
[1] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA
基金
美国国家科学基金会;
关键词
automatic program generation; discrete cosine transform (DCT); discrete Fourier transform (DFT); fast algorithm; implementation; multiply-and-accumulate (MAC); instruction; multiply and accumulate (MAC);
D O I
10.1109/TSP.2007.896116
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Several computer architectures offer fused multiply-add (FMA), also called multiply-and-accumulate (MAC) instructions, that are as fast as a single addition or multiplication. For the efficient implementation of linear transforms, such as the discrete Fourier transform or discrete cosine transforms, this poses a challenge to algorithm developers as standard transform algorithms have to be manipulated into FMA algorithms that make optimal use of FMA instructions. We present a general method to convert any transform algorithm into an FMA algorithm. The method works with both algorithms given as directed acyclic graphs (DAGs) and algorithms given as structured matrix factorizations. We prove bounds on the efficiency of the method. In particular, we show that it removes all single multiplications except at most as many as the transform has outputs. We implemented the DAG-based version of the method and show that we can generate many of the best-known hand-derived FMA, algorithms from the literature as well as a few novel FMA algorithms.
引用
收藏
页码:4458 / 4473
页数:16
相关论文
共 50 条
  • [31] Self-Timed Multiplier for Multiply-Add Unit
    Stepanov, B.
    Diachenko, Y.
    Rogdestvenski, Y.
    Diachenko, D.
    PROCEEDINGS OF THE 2016 IEEE NORTH WEST RUSSIA SECTION YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING CONFERENCE (ELCONRUSNW), 2016, : 349 - 352
  • [32] Comparison of single- and dual-pass multiply-add fused floating-point units
    Jessani, RM
    Putrino, M
    IEEE TRANSACTIONS ON COMPUTERS, 1998, 47 (09) : 927 - 937
  • [33] A new architecture for multiple-precision floating-point multiply-add fused unit design
    Huang, Libo
    Shen, Li
    Dai, Kui
    Wang, Zhiying
    18TH IEEE SYMPOSIUM ON COMPUTER ARITHMETIC, PROCEEDINGS, 2007, : 69 - +
  • [34] Multiple-Mode Floating-Point Multiply-Add Fused Unit for Trading Accuracy with Power Consumption
    Wu, Kun-Yi
    Liang, Chih-Yuan
    Yu, Kee-Khuan
    Kuang, Shiann-Rong
    2013 IEEE/ACIS 12TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2013, : 429 - 435
  • [35] HADAMARD TRANSFORMS ON MULTIPLY ADD ARCHITECTURES
    COPPERSMITH, D
    FEIG, E
    LINZER, E
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1994, 42 (04) : 969 - 970
  • [36] Correctly Rounded Constant Integer Division via Multiply-Add
    Drane, Theo
    Cheung, Wai-chuen
    Constantinides, George
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 1243 - 1246
  • [37] Modeling and synthesis of a modified floating point Fused Multiply-Add (FMA) Arithmetic Unit using VHDL and FPGAs
    Alghazo, J
    Nazeih, B
    CDES '05: Proceedings of the 2005 International Conference on Computer Design, 2005, : 136 - 142
  • [38] A Decimal Floating-point Fused Multiply-Add Unit with a Novel Decimal Leading-zero Anticipator
    Akkas, Ahmet
    Schulte, Michael J.
    ASAP 2011 - 22ND IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS (ASAP 2011), 2011, : 43 - 50
  • [39] Efficient Multiple-Precision Floating-Point Fused Multiply-Add with Mixed-Precision Support
    Zhang, Hao
    Chen, Dongdong
    Ko, Seok-Bum
    IEEE TRANSACTIONS ON COMPUTERS, 2019, 68 (07) : 1035 - 1048
  • [40] Dynamic Delay Variation Behaviour of RNS multiply-add Architectures
    Papachatzopoulos, Kleanthis
    Kouretas, Ioannis
    Paliouras, Vassilis
    2016 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2016, : 1978 - 1981