Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引:0
|
作者
Asadollah Shahbahrami
Ben Juurlink
Demid Borodin
Stamatis Vassiliadis
机构
[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science
[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering
关键词
Embedded media processors; multimedia kernels; register file; subword parallelism;
D O I
暂无
中图分类号
学科分类号
摘要
Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.
引用
收藏
页码:237 / 260
页数:23
相关论文
共 50 条
  • [11] Performance characterization and comparison of SIMD architectures
    Onbasioglu, E
    Paker, Y
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-III, PROCEEDINGS, 1997, : 1418 - 1422
  • [12] A parallel join algorithm for SIMD architectures
    Azadegan, S
    Tripathi, A
    JOURNAL OF SYSTEMS AND SOFTWARE, 1997, 39 (03) : 265 - 280
  • [13] Avoiding overhead aversion in charity
    Gneezy, Uri
    Keenan, Elizabeth A.
    Gneezy, Ayelet
    SCIENCE, 2014, 346 (6209) : 632 - 635
  • [14] SV: Enhancing SIMD Architectures via Combined SIMD-Vector Approach
    Huang, Libo
    Wang, Zhiying
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PT 1, PROCEEDINGS, 2010, 6081 : 226 - 235
  • [15] Improving Neural Network Performance on SIMD Architectures
    Limonova, Elena
    Ilin, Dmitry
    Nikolaev, Dmitry
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [16] Memory Partition for SIMD in Streaming Dataflow Architectures
    Shen, Xiaowei
    Ye, Xiaochun
    Tan, Xu
    Wang, Da
    Zhang, Zhimin
    Tang, Zhimin
    Fan, Dongrui
    2016 SEVENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2016,
  • [18] Efficient tree codes on SIMD computer architectures
    Olson, KM
    COMPUTER PHYSICS COMMUNICATIONS, 1996, 98 (03) : 267 - 287
  • [19] Issues in the design of high performance SIMD architectures
    Allen, JD
    Schimmel, DE
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1996, 7 (08) : 818 - 829
  • [20] Control flow emulation on tiled SIMD Architectures
    Lashari, Ghulam
    Lhotak, Ondrej
    McCool, Michael
    COMPILER CONSTRUCTION, 2008, 4959 : 100 - 115