Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引:0
|
作者
Asadollah Shahbahrami
Ben Juurlink
Demid Borodin
Stamatis Vassiliadis
机构
[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science
[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering
关键词
Embedded media processors; multimedia kernels; register file; subword parallelism;
D O I
暂无
中图分类号
学科分类号
摘要
Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.
引用
收藏
页码:237 / 260
页数:23
相关论文
共 50 条
  • [21] TOWARDS A SEMANTIC APPROACH TO SIMD ARCHITECTURES AND THEIR LANGUAGES
    BOUGE, L
    GARDA, P
    LECTURE NOTES IN COMPUTER SCIENCE, 1990, 469 : 142 - 175
  • [22] IRIS: a firmware design methodology for SIMD architectures
    Jacobs, Jan
    Van Engelen, Leroy
    Kuper, Jan
    Smit, Gerard J. M.
    Dai, Rui
    11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 609 - +
  • [23] Exploiting Mixed SIMD Parallelism by Reducing Data Reorganization Overhead
    Zhou, Hao
    Xue, Jingling
    PROCEEDINGS OF CGO 2016: THE 14TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2016, : 59 - 69
  • [24] Efficient implementation of a fast viewshed algorithm on SIMD architectures
    Bravo, Jesus Carabano
    Sarjakoski, Tapani
    Westerholm, Jan
    23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 199 - 202
  • [25] A High Throughput B plus tree for SIMD Architectures
    Zhang, Weihua
    Yan, Zhaofeng
    Lin, Yuzhe
    Zhao, Chuanlei
    Peng, Lu
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (03) : 707 - 720
  • [26] AVON A Fast Hash Function for Intel SIMD Architectures
    Henricksen, Matt
    Kiyomoto, Shinsaku
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY (SECRYPT 2013), 2013, : 482 - 487
  • [27] Application-specific SIMD synthesis for reconfigurable architectures
    Cheema, Muhammad Omer
    Hammami, Omar
    MICROPROCESSORS AND MICROSYSTEMS, 2006, 30 (06) : 398 - 412
  • [28] Data Layout Transformation for Structure Vectorization on SIMD Architectures
    Li, Peng-yuan
    Zhang, Qing-hua
    Zhao, Rong-cai
    Yu, Hai-ning
    2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 403 - 409
  • [29] A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures
    Sahasrabudhe, Damodar
    Phipps, Eric T.
    Rajamanickam, Sivasankaran
    Berzins, Martin
    ACCELERATOR PROGRAMMING USING DIRECTIVES, WACCPD 2019, 2020, 12017 : 140 - 163
  • [30] VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures
    Fan, Xiaokang
    Ge, Zhen
    Long, Sifan
    Tang, Tao
    Huang, Chun
    Peng, Lin
    Yang, Canqun
    EURO-PAR 2024: PARALLEL PROCESSING, PT III, EURO-PAR 2024, 2024, 14803 : 371 - 385