Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引:0
|
作者
Asadollah Shahbahrami
Ben Juurlink
Demid Borodin
Stamatis Vassiliadis
机构
[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science
[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering
关键词
Embedded media processors; multimedia kernels; register file; subword parallelism;
D O I
暂无
中图分类号
学科分类号
摘要
Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.
引用
收藏
页码:237 / 260
页数:23
相关论文
共 50 条
  • [1] Avoiding conversion and rearrangement overhead in SIMD architectures
    Shahbahrami, Asadollah
    Juurlink, Ben
    Borodin, Demid
    Vassiliadis, Stamatis
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2006, 34 (03) : 237 - 260
  • [2] Scalar Processing Overhead on SIMD-Only Architectures
    Azevedo, Arnaldo
    Juurlink, Ben
    2009 20TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2009, : 183 - 190
  • [3] Advanced SIMD: Extending the Reach of Contemporary SIMD Architectures
    Boettcher, Matthias
    Al-Hashimi, Bashir M.
    Eyole, Mbou
    Gabrielli, Giacomo
    Reid, Alastair
    2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
  • [4] MIMD programs on SIMD architectures
    Wu, MY
    Shu, W
    FRONTIERS '96 - THE SIXTH SYMPOSIUM ON FRONTIERS OF MASSIVELY PARALLEL COMPUTING, PROCEEDINGS, 1996, : 162 - 170
  • [5] Recursive filtering on SIMD architectures
    Schaffer, R
    Hosemann, M
    Merker, R
    Fettweis, G
    SIPS 2003: IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2003, : 263 - 268
  • [6] Efficient fuzzy compiler for SIMD architectures
    Frías-Martínez, E
    Gutiérrez-Ríos, J
    Fernández-Hernández, F
    APPLIED SOFT COMPUTING, 2004, 4 (03) : 287 - 301
  • [7] Vectorization for SIMD Architectures with alignment constraints
    Eichenberger, AE
    Wu, P
    O'Brien, K
    ACM SIGPLAN NOTICES, 2004, 39 (06) : 82 - 93
  • [8] Characterization of Quantum Workloads on SIMD Architectures
    Risque, Robert
    Jog, Adwait
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2016, : 34 - 42
  • [9] Fractal terrain generation for SIMD architectures
    Boyapati, Meghashyam
    Rankin, John R.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 34 (04) : 298 - 302
  • [10] Reduced complexity SIMD - Class architectures
    Glover, MA
    Rucinski, A
    Miller, WT
    EIGHTH ANNUAL IEEE INTERNATIONAL CONFERENCE ON INNOVATIVE SYSTEMS IN SILICON, 1996 PROCEEDINGS, 1996, : 352 - 361