Avoiding Conversion and Rearrangement Overhead in SIMD Architectures

被引：0

作者：

Asadollah Shahbahrami

Ben Juurlink

Demid Borodin

Stamatis Vassiliadis

机构：

[1] Delft University of Technology,Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science

[2] Guilan University,Department of Electrical and Computer Engineering, Faculty of Engineering

来源：

International Journal of Parallel Programming | 2006年 / 34卷

关键词：

Embedded media processors; multimedia kernels; register file; subword parallelism;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.

引用

页码：237 / 260

页数：23

共 50 条

[41] Translating Traditional SIMD Instructions to Vector Length Agnostic Architectures
Fu, Sheng-Yu
Hsu, Wei-Chung
PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 275 - 275
[42] A fast hexagon-based search algorithm on SIMD architectures
Duamnu, C. J.
2006 IEEE Asia Pacific Conference on Circuits and Systems, 2006, : 1579 - 1582
[43] A binary algorithm with low divergence for modular inversion on SIMD architectures
Laporta, Maurizio
Pizzirani, Alberto
RICERCHE DI MATEMATICA, 2014, 63 : S187 - S199
[44] ARBITRARY ROTATION OF RASTER IMAGES WITH SIMD MACHINE ARCHITECTURES.
Arabnia, H.R.
Oliver, M.A.
Computer Graphics Forum, 1987, 6 (01) : 3 - 11
[45] Automatic code generation for SIMD DSP architectures: An algebraic approach
Robelly, JP
Cichon, G
Seidel, H
Fettweis, G
INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2004, : 372 - 375
[46] ACCELERATING THE VVC DECODER FOR VECTOR LENGTH AGNOSTIC SIMD ARCHITECTURES
Kaddar, Yassin
Pohl, Angela
Ben Juurlink
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
[47] SOLVING THE TABLE MAKER'S DILEMMA ON CURRENT SIMD ARCHITECTURES
Avenel, Christophe
Fortin, Pierre
Gouicem, Mourad
Zaidi, Samia
SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2016, 17 (03): : 237 - 250
[48] Redefining the Relationship between Scalar and Parallel Units in SIMD Architectures
Wang, Yaohua
Chen, Shuming
Wan, Jianghua
Zhang, Kai
2013 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2013, : 781 - 784
[49] SIMD ARCHITECTURES AND ALGORITHMS FOR IMAGE-PROCESSING AND COMPUTER VISION
CYPHER, R
SANZ, JLC
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12): : 2158 - 2174
[50] Distance transform algorithm for bit-serial SIMD architectures
Takala, JH
Viitanen, JO
COMPUTER VISION AND IMAGE UNDERSTANDING, 1999, 74 (02) : 150 - 161

← 1 2 3 4 5 →