Generation of permutations for SIMD processors

被引:4
|
作者
Kudriavtsev, A [1 ]
Kogge, P [1 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
关键词
SIMD; permutations;
D O I
10.1145/1070891.1065931
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph. We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35 % speedup.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [31] Blocking optimized SIMD tree search on modern processors
    张倬
    陆宇凡
    沈文枫
    徐炜民
    郑衍衡
    Journal of Shanghai University(English Edition), 2011, 15 (05) : 437 - 444
  • [32] A fast block matching for SIMD processors using subsampling
    Moschetti, F
    Debes, E
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 321 - 324
  • [33] Efficient Utilization of SIMD Engines for General-Purpose Processors
    Huang, Libo
    Wang, Zhiying
    Xiao, Nong
    Dou, Qiang
    COMPUTER JOURNAL, 2014, 57 (08): : 1141 - 1154
  • [34] Multiple precision floating-point arithmetic on SIMD processors
    van der Hoeven, Joris
    2017 IEEE 24TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2017, : 2 - 9
  • [35] Performance Study of SIMD Programming Models on Intel Multicore Processors
    Kristof, Peter
    Yu, Hongtao
    Li, Zhiyuan
    Tian, Xinmin
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2423 - 2432
  • [36] Optimizing a fast stream cipher for VLIW, SIMD, and superscalar processors
    Clapp, CSK
    FAST SOFTWARE ENCRYPTION, 1997, 1267 : 273 - 287
  • [37] GENERATION OF RANDOM PERMUTATIONS
    ROBSON, JM
    COMMUNICATIONS OF THE ACM, 1969, 12 (11) : 634 - &
  • [38] Rapid Prototyping and Evaluation of Programmable SIMD SDR Processors in LISA
    Chen, Ting
    Liu, Hengzhu
    Zhang, Botao
    Liu, Dongpei
    FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): ALGORITHMS, PATTERN RECOGNITION AND BASIC TECHNOLOGIES, 2013, 8784
  • [39] PARALLEL GENERATION OF PERMUTATIONS
    GUPTA, P
    BHATTACHARJEE, GP
    COMPUTER JOURNAL, 1983, 26 (02): : 97 - 105
  • [40] Compiling C/C plus plus SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors
    Tian, Xinmin
    Saito, Hideki
    Girkar, Milind
    Preis, Serguei V.
    Kozhukhov, Sergey S.
    Cherkasov, Aleksei G.
    Nelson, Clark
    Panchenko, Nikolay
    Geva, Robert
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2349 - 2358