Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引:0
|
作者
Asadollah Shahbahrami
Ben Juurlink
Demid Borodin
Stamatis Vassiliadis
机构
[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science
[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering
关键词
Embedded media processors; multimedia kernels; register file; subword parallelism;
D O I
暂无
中图分类号
学科分类号
摘要
Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.
引用
收藏
页码:237 / 260
页数:23
相关论文
共 50 条
  • [41] Translating Traditional SIMD Instructions to Vector Length Agnostic Architectures
    Fu, Sheng-Yu
    Hsu, Wei-Chung
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 275 - 275
  • [42] A fast hexagon-based search algorithm on SIMD architectures
    Duamnu, C. J.
    2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006, : 1579 - 1582
  • [43] A binary algorithm with low divergence for modular inversion on SIMD architectures
    Laporta, Maurizio
    Pizzirani, Alberto
    RICERCHE DI MATEMATICA, 2014, 63 : S187 - S199
  • [44] ARBITRARY ROTATION OF RASTER IMAGES WITH SIMD MACHINE ARCHITECTURES.
    Arabnia, H.R.
    Oliver, M.A.
    Computer Graphics Forum, 1987, 6 (01) : 3 - 11
  • [45] Automatic code generation for SIMD DSP architectures: An algebraic approach
    Robelly, JP
    Cichon, G
    Seidel, H
    Fettweis, G
    INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2004, : 372 - 375
  • [46] ACCELERATING THE VVC DECODER FOR VECTOR LENGTH AGNOSTIC SIMD ARCHITECTURES
    Kaddar, Yassin
    Pohl, Angela
    Ben Juurlink
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [47] SOLVING THE TABLE MAKER'S DILEMMA ON CURRENT SIMD ARCHITECTURES
    Avenel, Christophe
    Fortin, Pierre
    Gouicem, Mourad
    Zaidi, Samia
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2016, 17 (03): : 237 - 250
  • [48] Redefining the Relationship between Scalar and Parallel Units in SIMD Architectures
    Wang, Yaohua
    Chen, Shuming
    Wan, Jianghua
    Zhang, Kai
    2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2013, : 781 - 784
  • [49] SIMD ARCHITECTURES AND ALGORITHMS FOR IMAGE-PROCESSING AND COMPUTER VISION
    CYPHER, R
    SANZ, JLC
    IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12): : 2158 - 2174
  • [50] Distance transform algorithm for bit-serial SIMD architectures
    Takala, JH
    Viitanen, JO
    COMPUTER VISION AND IMAGE UNDERSTANDING, 1999, 74 (02) : 150 - 161