Generation of permutations for SIMD processors

被引:4
|
作者
Kudriavtsev, A [1 ]
Kogge, P [1 ]
机构
[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
关键词
SIMD; permutations;
D O I
10.1145/1070891.1065931
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph. We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35 % speedup.
引用
收藏
页码:147 / 156
页数:10
相关论文
共 50 条
  • [41] SPEEDING UP FILTERED BACK PROJECTION USING SIMD ARRAY PROCESSORS
    TASTO, M
    JOURNAL OF COMPUTER ASSISTED TOMOGRAPHY, 1977, 1 (02) : 258 - 258
  • [42] IMPLEMENTATION OF HEVC DECODER ON X86 PROCESSORS WITH SIMD OPTIMIZATION
    Yan, Leju
    Duan, Yizhou
    Sun, Jun
    Guo, Zongming
    2012 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2012,
  • [43] On the cost effectiveness of logarithmic arithmetic for back propagation training on SIMD processors
    Arnold, MG
    Bailey, TA
    Cupal, JJ
    Winkel, MD
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 933 - 936
  • [44] A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
    Zhang, Kai
    Chen, ShuMing
    Liu, Wei
    Ning, Xi
    NETWORK AND PARALLEL COMPUTING, NPC 2013, 2013, 8147 : 39 - 48
  • [45] HOUGH TRANSFORM ALGORITHMS FOR MESH-CONNECTED SIMD PARALLEL PROCESSORS
    ROSENFELD, A
    ORNELAS, J
    HUNG, Y
    COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1988, 41 (03): : 293 - 305
  • [46] Explicit data organization SIMD instruction set architecture for media processors
    Liu, Chunyue
    Qin, Xing
    Yan, Xiaolang
    PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS, 2007, : 227 - +
  • [47] Evaluation of SIMD architecture enhancement in embedded processors for MPEG-4
    Iranpour, AR
    Kuchcinski, K
    PROCEEDINGS OF THE EUROMICRO SYSTEMS ON DIGITAL SYSTEM DESIGN, 2004, : 262 - 269
  • [48] A HW/SW design methodology for embedded SIMD vector signal processors
    Robelly, J. P.
    Cichon, G.
    Ahlendorf, H.
    Fettweis, G.
    INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2008, 3 (03) : 160 - 169
  • [49] Fractal terrain generation for SIMD architectures
    Boyapati, Meghashyam
    Rankin, John R.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 34 (04) : 298 - 302
  • [50] GENERATION OF PERMUTATIONS IN LEXICOGRAPHIC ORDER
    LEITCH, IM
    COMMUNICATIONS OF THE ACM, 1969, 12 (09) : 512 - &