Scalar Processing Overhead on SIMD-Only Architectures

被引：0

作者：

Azevedo, Arnaldo ^{[1
]}

Juurlink, Ben ^{[1
]}

机构：

[1] Delft Univ Technol, Fac Elect Engn Math & Comp Sci, Comp Engn Grp, Delft, Netherlands

来源：

2009 20TH IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS | 2009年

关键词：

Computer architecture; Datapath; SIMD processing; SIMD overhead;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The Cell processor consists of a general-purpose core and eight cores with a complete SIMD instruction set. Although originally designed for multimedia and gaming, it is currently being used for a much broader range of applications. In this paper we evaluate if the Cell SPEs could benefit significantly from a scalar processing unit using two methodologies. In the first methodology the scalar processing overhead is eliminated by replacing all scalar data types by the quadword data type. This methodology is feasible only for relatively small kernels. In the second methodology SPE performance is compared to the performance of a similarly configured PPU, which supports scalar operations. Experimental results show that the scalar processing overhead ranges from 19% to 57% for small kernels and from 12% to 39% for large kernels. Solutions to eliminate this overhead are also discussed.

引用

页码：183 / 190

页数：8

共 50 条

[41] VLASPH: Smoothed Particle Hydrodynamics on VLA SIMD Architectures
Fan, Xiaokang
Ge, Zhen
Long, Sifan
Tang, Tao
Huang, Chun
Peng, Lin
Yang, Canqun
EURO-PAR 2024: PARALLEL PROCESSING, PT III, EURO-PAR 2024, 2024, 14803 : 371 - 385
[42] Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs
Yilmazer, Ayse
Chen, Zhongliang
Kaeli, David
2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
[43] Cholesky factorization on SIMD multi-core architectures
Lemaitre, Florian
Couturier, Benjamin
Lacassagne, Lionel
JOURNAL OF SYSTEMS ARCHITECTURE, 2017, 79 : 1 - 15
[44] PIPELINING TREE-STRUCTURED ALGORITHMS ON SIMD ARCHITECTURES
BARNARD, DT
SKILLICORN, DB
INFORMATION PROCESSING LETTERS, 1990, 35 (02) : 79 - 84
[45] A CHOLESKY UPDATING AND DOWNDATING ALGORITHM FOR SYSTOLIC AND SIMD ARCHITECTURES
BISCHOF, CH
PAN, CT
TANG, PTP
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1993, 14 (03): : 670 - 676
[46] Influences of SIMD Architectures for Scattered Data Interpolation Algorithm
Tournier, Jean-Charles
Naef, Martin
2010 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2010), 2010, : 109 - 110
[47] A flexible algorithm for calculating pair interactions on SIMD architectures
Pall, Szilard
Hess, Berk
COMPUTER PHYSICS COMMUNICATIONS, 2013, 184 (12) : 2641 - 2650
[48] Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices
Shahbahram, Asadollah
Juurlink, Ben
ADVANCED PARALLEL PROCESSING TECHNOLOGIES, PROCEEDINGS, 2009, 5737 : 389 - 407
[49] Reconfigurable SIMD units for image processing
Aguado, David
Revenga, Pedro
Lazaro, Jose Luis
Derutin, Jean Pierre
2007 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING, CONFERENCE PROCEEDINGS BOOK, 2007, : 663 - +
[50] AUGMENTING ADA FOR SIMD PARALLEL PROCESSING
CLINE, CL
SIEGEL, HJ
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1985, 11 (09) : 970 - 977

← 1 2 3 4 5 →