SIMD programming using Intel vector extensions

被引:26
|
作者
Amiri, Hossein [1 ]
Shahbahrami, Asadollah [1 ]
机构
[1] Univ Guilan, Dept Comp Engn, Fac Engn, Rasht, Iran
关键词
Intel; SIMD; AVX; AVX-512; Vectorization; PERFORMANCE EVALUATION; PARALLELISM; IMAGE;
D O I
10.1016/j.jpdc.2019.09.012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Single instruction multiple data (SIMD) extensions are one of the most significant capabilities of recent General Purpose Processors (GPPs) which improves the performance of applications with less hardware modification. Each GPP vendor such as HP, Sun, Intel, and AMD has its particular Instruction Set Architecture (ISA) and SIMD micro-architecture with different perspectives. Intel expanded SIMD technologies from hardware and software point of view. It has introduced SIMD technologies such as MultiMedia eXtensions (MMX), Streaming SIMD Extensions (SSE), Advanced Vector eXtensions (AVX), Fused Multiply Add (FMA) and AVX-512 sets. During micro-processors developments path, register width has been extended from 64 bits to 512 bits and number of vector registers has been increased from 8 to 32. Wider registers provide more parallelism ways and more registers reduce extra data movement to the cache memory. In order to gain the advantages of SIMD extensions, many programming approaches have been developed. Compiler Automatic Vectorization (CAV) as an implicit vectorization approach, provides simple and easy SIMDization tools. While, performance improvement of CAV is not always granted, most compilers auto-vectorize simple loops. On the other hand, for explicit vectorization, Intrinsic Programming Model (IPM) provides low-level access to vector registers for SIMDizing. However, programming with IPM requires great amount of expertise especially in low-level architecture feature, thus, choosing the suitable instructions and vectorization methodology for mapping to a certain algorithm is important. Moreover, portability, compatibility, scalability and compiler optimization might limit the advantage of IPM. Our goal in this paper is as follows. First, we provide a review of SIMD technology in general and Intel's SIMD extensions in particular. Second, some SIMD features of Intel SIMD technologies, MMX, SSEs, AVX, and FMA in terms of ISA, vector width, and SIMD programming tools are comparatively discussed. Third, in order to compare the performance of different auto-vectorizers and IPM approaches using Intel C++ compiler (ICC), GNU Compiler Collection (GCC) and Low Level Virtual Machine (LLVM), we map and implement some representative multimedia kernels on AVX and AVX2 extensions. Finally, our experimental results show that although the performance improvement using IPM approach is higher than CAV5, programmer needs more programming efforts and knows different mapping strategists. Therefore, extending autovectorizers abilities to generate more efficient vectorized codes is an important issue in different compilers. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:83 / 100
页数:18
相关论文
共 50 条
  • [1] Using Intel Streaming SIMD extensions for 3D geometry processing
    Ma, WC
    Yang, CL
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 1080 - 1087
  • [2] Investigating Large Integer Arithmetic on Intel Xeon Phi SIMD Extensions
    Keliris, Anastasis
    Maniatakos, Michail
    [J]. 2014 9TH IEEE INTERNATIONAL CONFERENCE ON DESIGN & TECHNOLOGY OF INTEGRATED SYSTEMS IN NANOSCALE ERA (DTIS 2014), 2014,
  • [3] Performance Study of SIMD Programming Models on Intel Multicore Processors
    Kristof, Peter
    Yu, Hongtao
    Li, Zhiyuan
    Tian, Xinmin
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2423 - 2432
  • [4] C-for-Metal: High Performance SIMD Programming on Intel GPUs
    Lueh, Guei-Yuan
    Chen, Kaiyu
    Chen, Gang
    Fuentes, Joel
    Chen, Wei-Yu
    Fu, Fangwen
    Jiang, Hong
    Li, Hongzheng
    Rhee, Daniel
    [J]. CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2021, : 289 - 300
  • [5] Ultra fast grey scale face detection using vector SIMD programming
    Vermeulen, O.
    Manzanera, A.
    Lacassagne, L.
    [J]. SITIS 2007: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SIGNAL IMAGE TECHNOLOGIES & INTERNET BASED SYSTEMS, 2008, : 585 - +
  • [6] PARALLELIZATION OF WAVELET FILTERS USING SIMD EXTENSIONS
    Kutil, Rade
    Eder, Peter
    [J]. PARALLEL PROCESSING LETTERS, 2006, 16 (03) : 335 - 349
  • [7] Parallelization of IIR filters using SIMD extensions
    Kutil, Rade
    [J]. PROCEEDINGS OF IWSSIP 2008: 15TH INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING, 2008, : 65 - 68
  • [8] Performance aspects of using various techniques of programming SIMD extensions of modern general-purpose processors
    Trocki, Krzysztof
    [J]. PROCEEDINGS OF THE 2008 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, 2008, : 413 - 416
  • [9] Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows
    Kandiah, Vijay
    Lustig, Daniel
    Villa, Oreste
    Nellans, David
    Hardavellas, Nikos
    [J]. PROCEEDINGS OF THE 21ST ACM/IEEE INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, CGO 2023, 2023, : 186 - 198
  • [10] Exploring SIMD for Molecular Dynamics, Using Intel®Xeon®Processors and Intel®Xeon Phi™ Coprocessors
    Pennycook, S. J.
    Hughes, C. J.
    Smelyanskiy, M.
    Jarvis, S. A.
    [J]. IEEE 27TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2013), 2013, : 1085 - 1097