Generation of permutations for SIMD processors

被引：4

作者：

Kudriavtsev, A ^{[1
]}

Kogge, P ^{[1
]}

机构：

[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

来源：

ACM SIGPLAN NOTICES | 2005年 / 40卷 / 07期

关键词：

SIMD; permutations;

D O I：

10.1145/1070891.1065931

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph. We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35 % speedup.

引用

页码：147 / 156

页数：10

共 50 条

[41] SPEEDING UP FILTERED BACK PROJECTION USING SIMD ARRAY PROCESSORS
TASTO, M
JOURNAL OF COMPUTER ASSISTED TOMOGRAPHY, 1977, 1 (02) : 258 - 258
[42] IMPLEMENTATION OF HEVC DECODER ON X86 PROCESSORS WITH SIMD OPTIMIZATION
Yan, Leju
Duan, Yizhou
Sun, Jun
Guo, Zongming
2012 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2012,
[43] On the cost effectiveness of logarithmic arithmetic for back propagation training on SIMD processors
Arnold, MG
Bailey, TA
Cupal, JJ
Winkel, MD
1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 933 - 936
[44] A Fine-Grained Pipelined Implementation of LU Decomposition on SIMD Processors
Zhang, Kai
Chen, ShuMing
Liu, Wei
Ning, Xi
NETWORK AND PARALLEL COMPUTING, NPC 2013, 2013, 8147 : 39 - 48
[45] HOUGH TRANSFORM ALGORITHMS FOR MESH-CONNECTED SIMD PARALLEL PROCESSORS
ROSENFELD, A
ORNELAS, J
HUNG, Y
COMPUTER VISION GRAPHICS AND IMAGE PROCESSING, 1988, 41 (03): : 293 - 305
[46] Explicit data organization SIMD instruction set architecture for media processors
Liu, Chunyue
Qin, Xing
Yan, Xiaolang
PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING AND NETWORKS, 2007, : 227 - +
[47] Evaluation of SIMD architecture enhancement in embedded processors for MPEG-4
Iranpour, AR
Kuchcinski, K
PROCEEDINGS OF THE EUROMICRO SYSTEMS ON DIGITAL SYSTEM DESIGN, 2004, : 262 - 269
[48] A HW/SW design methodology for embedded SIMD vector signal processors
Robelly, J. P.
Cichon, G.
Ahlendorf, H.
Fettweis, G.
INTERNATIONAL JOURNAL OF EMBEDDED SYSTEMS, 2008, 3 (03) : 160 - 169
[49] Fractal terrain generation for SIMD architectures
Boyapati, Meghashyam
Rankin, John R.
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 34 (04) : 298 - 302
[50] GENERATION OF PERMUTATIONS IN LEXICOGRAPHIC ORDER
LEITCH, IM
COMMUNICATIONS OF THE ACM, 1969, 12 (09) : 512 - &

← 1 2 3 4 5 →