Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引：0

作者：

Asadollah Shahbahrami

Ben Juurlink

Demid Borodin

Stamatis Vassiliadis

机构：

[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science

[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering

来源：

International Journal of Parallel Programming | 2006年 / 34卷

关键词：

Embedded media processors; multimedia kernels; register file; subword parallelism;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.

引用

页码：237 / 260

页数：23

共 50 条

[21] TOWARDS A SEMANTIC APPROACH TO SIMD ARCHITECTURES AND THEIR LANGUAGES
BOUGE, L
GARDA, P
LECTURE NOTES IN COMPUTER SCIENCE, 1990, 469 : 142 - 175
[22] IRIS: a firmware design methodology for SIMD architectures
Jacobs, Jan
Van Engelen, Leroy
Kuper, Jan
Smit, Gerard J. M.
Dai, Rui
11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 609 - +
[23] Exploiting Mixed SIMD Parallelism by Reducing Data Reorganization Overhead
Zhou, Hao
Xue, Jingling
PROCEEDINGS OF CGO 2016: THE 14TH INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION, 2016, : 59 - 69
[24] Efficient implementation of a fast viewshed algorithm on SIMD architectures
Bravo, Jesus Carabano
Sarjakoski, Tapani
Westerholm, Jan
23RD EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2015), 2015, : 199 - 202
[25] A High Throughput B plus tree for SIMD Architectures
Zhang, Weihua
Yan, Zhaofeng
Lin, Yuzhe
Zhao, Chuanlei
Peng, Lu
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (03) : 707 - 720
[26] AVON A Fast Hash Function for Intel SIMD Architectures
Henricksen, Matt
Kiyomoto, Shinsaku
PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY (SECRYPT 2013), 2013, : 482 - 487
[27] Application-specific SIMD synthesis for reconfigurable architectures
Cheema, Muhammad Omer
Hammami, Omar
MICROPROCESSORS AND MICROSYSTEMS, 2006, 30 (06) : 398 - 412
[28] Data Layout Transformation for Structure Vectorization on SIMD Architectures
Li, Peng-yuan
Zhang, Qing-hua
Zhao, Rong-cai
Yu, Hai-ning
2015 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2015, : 403 - 409
[29] A Portable SIMD Primitive Using Kokkos for Heterogeneous Architectures
Sahasrabudhe, Damodar
Phipps, Eric T.
Rajamanickam, Sivasankaran
Berzins, Martin
ACCELERATOR PROGRAMMING USING DIRECTIVES, WACCPD 2019, 2020, 12017 : 140 - 163
[30] VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures
Fan, Xiaokang
Ge, Zhen
Long, Sifan
Tang, Tao
Huang, Chun
Peng, Lin
Yang, Canqun
EURO-PAR 2024: PARALLEL PROCESSING, PT III, EURO-PAR 2024, 2024, 14803 : 371 - 385

← 1 2 3 4 5 →