SIMD programming using Intel vector extensions

被引：26

作者：

Amiri, Hossein ^{[1
]}

Shahbahrami, Asadollah ^{[1
]}

机构：

[1] Univ Guilan, Dept Comp Engn, Fac Engn, Rasht, Iran

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2020年 / 135卷 / 135期

关键词：

Intel; SIMD; AVX; AVX-512; Vectorization; PERFORMANCE EVALUATION; PARALLELISM; IMAGE;

D O I：

10.1016/j.jpdc.2019.09.012

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Single instruction multiple data (SIMD) extensions are one of the most significant capabilities of recent General Purpose Processors (GPPs) which improves the performance of applications with less hardware modification. Each GPP vendor such as HP, Sun, Intel, and AMD has its particular Instruction Set Architecture (ISA) and SIMD micro-architecture with different perspectives. Intel expanded SIMD technologies from hardware and software point of view. It has introduced SIMD technologies such as MultiMedia eXtensions (MMX), Streaming SIMD Extensions (SSE), Advanced Vector eXtensions (AVX), Fused Multiply Add (FMA) and AVX-512 sets. During micro-processors developments path, register width has been extended from 64 bits to 512 bits and number of vector registers has been increased from 8 to 32. Wider registers provide more parallelism ways and more registers reduce extra data movement to the cache memory. In order to gain the advantages of SIMD extensions, many programming approaches have been developed. Compiler Automatic Vectorization (CAV) as an implicit vectorization approach, provides simple and easy SIMDization tools. While, performance improvement of CAV is not always granted, most compilers auto-vectorize simple loops. On the other hand, for explicit vectorization, Intrinsic Programming Model (IPM) provides low-level access to vector registers for SIMDizing. However, programming with IPM requires great amount of expertise especially in low-level architecture feature, thus, choosing the suitable instructions and vectorization methodology for mapping to a certain algorithm is important. Moreover, portability, compatibility, scalability and compiler optimization might limit the advantage of IPM. Our goal in this paper is as follows. First, we provide a review of SIMD technology in general and Intel's SIMD extensions in particular. Second, some SIMD features of Intel SIMD technologies, MMX, SSEs, AVX, and FMA in terms of ISA, vector width, and SIMD programming tools are comparatively discussed. Third, in order to compare the performance of different auto-vectorizers and IPM approaches using Intel C++ compiler (ICC), GNU Compiler Collection (GCC) and Low Level Virtual Machine (LLVM), we map and implement some representative multimedia kernels on AVX and AVX2 extensions. Finally, our experimental results show that although the performance improvement using IPM approach is higher than CAV5, programmer needs more programming efforts and knows different mapping strategists. Therefore, extending autovectorizers abilities to generate more efficient vectorized codes is an important issue in different compilers. (C) 2019 Elsevier Inc. All rights reserved.

引用

页码：83 / 100

页数：18

共 50 条

[1] Using Intel Streaming SIMD extensions for 3D geometry processing
Ma, WC
Yang, CL
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 1080 - 1087
[2] Investigating Large Integer Arithmetic on Intel Xeon Phi SIMD Extensions
Keliris, Anastasis
Maniatakos, Michail
[J]. 2014 9TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2014), 2014,
[3] Performance Study of SIMD Programming Models on Intel Multicore Processors
Kristof, Peter
Yu, Hongtao
Li, Zhiyuan
Tian, Xinmin
[J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2423 - 2432
[4] C-for-Metal: High Performance SIMD Programming on Intel GPUs
Lueh, Guei-Yuan
Chen, Kaiyu
Chen, Gang
Fuentes, Joel
Chen, Wei-Yu
Fu, Fangwen
Jiang, Hong
Li, Hongzheng
Rhee, Daniel
[J]. CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2021, : 289 - 300
[5] Ultra fast grey scale face detection using vector SIMD programming
Vermeulen, O.
Manzanera, A.
Lacassagne, L.
[J]. SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, 2008, : 585 - +
[6] PARALLELIZATION OF WAVELET FILTERS USING SIMD EXTENSIONS
Kutil, Rade
Eder, Peter
[J]. PARALLEL PROCESSING LETTERS, 2006, 16 (03) : 335 - 349
[7] Parallelization of IIR filters using SIMD extensions
Kutil, Rade
[J]. PROCEEDINGS OF IWSSIP 2008: 15TH INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING, 2008, : 65 - 68
[8] Performance aspects of using various techniques of programming SIMD extensions of modern general-purpose processors
Trocki, Krzysztof
[J]. PROCEEDINGS OF THE 2008 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, 2008, : 413 - 416
[9] Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows
Kandiah, Vijay
Lustig, Daniel
Villa, Oreste
Nellans, David
Hardavellas, Nikos
[J]. PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023, 2023, : 186 - 198
[10] Exploring SIMD for Molecular Dynamics, Using Intel®Xeon®Processors and Intel®Xeon Phi™ Coprocessors
Pennycook, S. J.
Hughes, C. J.
Smelyanskiy, M.
Jarvis, S. A.
[J]. IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1085 - 1097

← 1 2 3 4 5 →