Generation of permutations for SIMD processors

被引:4
|
作者
Kudriavtsev, A [1 ]
Kogge, P [1 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
关键词
SIMD; permutations;
D O I
10.1145/1070891.1065931
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph. We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35 % speedup.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [21] A code selection method for SIMD processors with PACK instructions
    Tanaka, H
    Kobayashi, S
    Takeuchi, Y
    Sakanushi, K
    Imai, M
    SOFTWARE AND COMPILERS FOR EMBEDDED SYSTEMS, 2003, 2826 : 66 - 80
  • [22] SIMD extension to VLIW multicluster processors for embedded applications
    Barretta, D
    Fornaciari, W
    Sami, M
    Pau, D
    ICCD'2002: IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS AND PROCESSORS, PROCEEDINGS, 2002, : 523 - 526
  • [23] SIMD Acceleration of a TLM Solver Using ARM Processors
    Browne, D. R.
    Chouliaras, V. A.
    Flint, J. A.
    Pomeroy, S. C.
    2012 LOUGHBOROUGH ANTENNAS & PROPAGATION CONFERENCE (LAPC), 2012,
  • [24] An efficient implementation of FFT on SIMD capable embedded processors
    Zhu, Sheng
    Yu, Feng
    Ge, Ruifeng
    Journal of Computational Information Systems, 2014, 10 (01): : 411 - 418
  • [25] Design of Parallel BEM Analyses Framework for SIMD Processors
    Hoshino, Tetsuya
    Ida, Akihiro
    Hanawa, Toshihiro
    Nakajima, Kengo
    COMPUTATIONAL SCIENCE - ICCS 2018, PT I, 2018, 10860 : 601 - 613
  • [26] Improved SIMD Architecture for High Performance Video Processors
    Lo, Wing-Yee
    Lun, Daniel Pak-Kong
    Siu, Wan-Chi
    Wang, Wendong
    Song, Jiqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2011, 21 (12) : 1769 - 1783
  • [27] Vectorizing programs with IF-statements for processors with SIMD extensions
    Huihui Sun
    Sergei Gorlatch
    Rongcai Zhao
    The Journal of Supercomputing, 2020, 76 : 4731 - 4746
  • [28] Vectorizing programs with IF-statements for processors with SIMD extensions
    Sun, Huihui
    Gorlatch, Sergei
    Zhao, Rongcai
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (06): : 4731 - 4746
  • [29] An Enhanced DMA Controller in SIMD Processors for Video Applications
    Paya-Vaya, Guillermo
    Martin-Langerwerf, Javier
    Moch, Soeren
    Pirsch, Peter
    ARCHITECTURE OF COMPUTING SYSTEMS-ARCS 2009, 22ND INTERNATIONAL CONFERENCE, 2009, 5455 : 159 - +
  • [30] Compiler supports for VLIW DSP processors with SIMD intrinsics
    Kuan, Chi-Bang
    Lee, Jenq Kuen
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2012, 24 (05): : 517 - 532