Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引：0

作者：

Asadollah Shahbahrami

Ben Juurlink

Demid Borodin

Stamatis Vassiliadis

机构：

[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science

[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering

来源：

International Journal of Parallel Programming | 2006年 / 34卷

关键词：

Embedded media processors; multimedia kernels; register file; subword parallelism;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.

引用

页码：237 / 260

页数：23

共 50 条

[1] Avoiding conversion and rearrangement overhead in SIMD architectures
Shahbahrami, Asadollah
Juurlink, Ben
Borodin, Demid
Vassiliadis, Stamatis
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2006, 34 (03) : 237 - 260
[2] Scalar Processing Overhead on SIMD-Only Architectures
Azevedo, Arnaldo
Juurlink, Ben
2009 20TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2009, : 183 - 190
[3] Advanced SIMD: Extending the Reach of Contemporary SIMD Architectures
Boettcher, Matthias
Al-Hashimi, Bashir M.
Eyole, Mbou
Gabrielli, Giacomo
Reid, Alastair
2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
[4] MIMD programs on SIMD architectures
Wu, MY
Shu, W
FRONTIERS '96 - THE SIXTH SYMPOSIUM ON FRONTIERS OF MASSIVELY PARALLEL COMPUTING, PROCEEDINGS, 1996, : 162 - 170
[5] Recursive filtering on SIMD architectures
Schaffer, R
Hosemann, M
Merker, R
Fettweis, G
SIPS 2003: IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2003, : 263 - 268
[6] Efficient fuzzy compiler for SIMD architectures
Frías-Martínez, E
Gutiérrez-Ríos, J
Fernández-Hernández, F
APPLIED SOFT COMPUTING, 2004, 4 (03) : 287 - 301
[7] Vectorization for SIMD Architectures with alignment constraints
Eichenberger, AE
Wu, P
O'Brien, K
ACM SIGPLAN NOTICES, 2004, 39 (06) : 82 - 93
[8] Characterization of Quantum Workloads on SIMD Architectures
Risque, Robert
Jog, Adwait
PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, 2016, : 34 - 42
[9] Fractal terrain generation for SIMD architectures
Boyapati, Meghashyam
Rankin, John R.
INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2009, 34 (04) : 298 - 302
[10] Reduced complexity SIMD - Class architectures
Glover, MA
Rucinski, A
Miller, WT
EIGHTH ANNUAL IEEE INTERNATIONAL CONFERENCE ON INNOVATIVE SYSTEMS IN SILICON, 1996 PROCEEDINGS, 1996, : 352 - 361

← 1 2 3 4 5 →