Generation of permutations for SIMD processors

被引：4

作者：

Kudriavtsev, A ^{[1
]}

Kogge, P ^{[1
]}

机构：

[1] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA

来源：

ACM SIGPLAN NOTICES | 2005年 / 40卷 / 07期

关键词：

SIMD; permutations;

D O I：

10.1145/1070891.1065931

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Short vector (SIMD) instructions are useful in signal processing, multimedia, and scientific applications. They offer higher performance, lower energy consumption, and better resource utilization. However, compilers still do not have good support for SIMD instructions, and often the code has to be written manually in assembly language or using compiler builtin functions. Also, in some applications, higher parallelism could be achieved if compilers inserted permutation instructions that reorder the data in registers. In this paper we describe how we create SIMD instructions from regular code, and determine ordering of individual operations in the SIMD instructions to minimize the number of permutation instructions. Individual memory operations are grouped into SIMD operations based on their effective addresses. The SIMD data flow graph is then constructed by following data dependences from SIMD memory operations. Then, the orderings of operations are propagated from SIMD memory operations into the graph. We also describe our approach to compute decomposition of a given permutation into the permutation instructions of the target architecture. Experiments with our prototype compiler show that this approach scales well with the number of operations in SIMD instructions (SIMD width) and can be used to compile a number of important kernels, achieving up to 35 % speedup.

引用

页码：147 / 156

页数：10

共 50 条

[31] Blocking optimized SIMD tree search on modern processors
张倬
陆宇凡
沈文枫
徐炜民
郑衍衡
Journal of Shanghai University(English Edition), 2011, 15 (05) : 437 - 444
[32] A fast block matching for SIMD processors using subsampling
Moschetti, F
Debes, E
ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL IV: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 321 - 324
[33] Efficient Utilization of SIMD Engines for General-Purpose Processors
Huang, Libo
Wang, Zhiying
Xiao, Nong
Dou, Qiang
COMPUTER JOURNAL, 2014, 57 (08): : 1141 - 1154
[34] Multiple precision floating-point arithmetic on SIMD processors
van der Hoeven, Joris
2017 IEEE 24TH SYMPOSIUM ON COMPUTER ARITHMETIC (ARITH), 2017, : 2 - 9
[35] Performance Study of SIMD Programming Models on Intel Multicore Processors
Kristof, Peter
Yu, Hongtao
Li, Zhiyuan
Tian, Xinmin
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2423 - 2432
[36] Optimizing a fast stream cipher for VLIW, SIMD, and superscalar processors
Clapp, CSK
FAST SOFTWARE ENCRYPTION, 1997, 1267 : 273 - 287
[37] GENERATION OF RANDOM PERMUTATIONS
ROBSON, JM
COMMUNICATIONS OF THE ACM, 1969, 12 (11) : 634 - &
[38] Rapid Prototyping and Evaluation of Programmable SIMD SDR Processors in LISA
Chen, Ting
Liu, Hengzhu
Zhang, Botao
Liu, Dongpei
FIFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2012): ALGORITHMS, PATTERN RECOGNITION AND BASIC TECHNOLOGIES, 2013, 8784
[39] PARALLEL GENERATION OF PERMUTATIONS
GUPTA, P
BHATTACHARJEE, GP
COMPUTER JOURNAL, 1983, 26 (02): : 97 - 105
[40] Compiling C/C plus plus SIMD Extensions for Function and Loop Vectorizaion on Multicore-SIMD Processors
Tian, Xinmin
Saito, Hideki
Girkar, Milind
Preis, Serguei V.
Kozhukhov, Sergey S.
Cherkasov, Aleksei G.
Nelson, Clark
Panchenko, Nikolay
Geva, Robert
2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2349 - 2358

← 1 2 3 4 5 →