Fast and small short vector SIMD matrix multiplication kernels for the synergistic processing element of the CELL processor

被引：0

作者：

Alvaro, Wesley ^{[1
]}

Kurzak, Jakub ^{[1
]}

Dongarra, Jack ^{[1
]}

机构：

[1] Univ Tennessee, Knoxville, TN 37996 USA

来源：

COMPUTATIONAL SCIENCE - ICCS 2008, PT 1 | 2008年 / 5101卷

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic Processing Element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C = C - A x B-T operation and the C = C - A x B operation for matrices of size 64 x 64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.

引用

页码：935 / 944

页数：10

共 8 条

[1] Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor
Kurzak, Jakub
Alvaro, Wesley
Dongarra, Jack
PARALLEL COMPUTING, 2009, 35 (03) : 138 - 150
[2] Fast Sparse Matrix-Vector Multiplication on Graphics Processing Unit for Finite Element Analysis
Ahamed, Abal-Kassim Cheik
Magoules, Frederic
2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1307 - 1314
[3] The vector fixed point unit of the synergistic processor element of the cell architecture processor
Mäding, N
Leenstra, J
Pille, J
Sautter, R
Büttner, S
Ehrenreich, S
Haller, W
ESSCIRC 2005: PROCEEDINGS OF THE 31ST EUROPEAN SOLID-STATE CIRCUITS CONFERENCE, 2005, : 203 - 206
[4] The vector floating-point unit in a synergistic processor element of a CELL processor
Mueller, SM
Jacobi, C
Oh, HJ
Tran, KD
Cottier, SR
Michael, BW
Nishikawa, H
Totsuka, Y
Namatame, T
Yano, N
Machida, T
Dhong, SH
17th IEEE Symposium on Computer Arithmetic, Proceedings, 2005, : 59 - 67
[5] The Vector Fixed Point Unit of the synergistic processor element of the cell architecture processor
Maeding, N.
Leenstra, J.
Pille, J.
Sautter, R.
Buettner, S.
Ehrenreich, S.
Haller, W.
2006 DESIGN AUTOMATION AND TEST IN EUROPE, VOLS 1-3, PROCEEDINGS, 2006, : 1579 - +
[6] Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units
Dehnavi, Maryam Mehri
Fernandez, David M.
Giannacopoulos, Dennis
IEEE TRANSACTIONS ON MAGNETICS, 2010, 46 (08) : 2982 - 2985
[7] A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems
Wong, J.
Kuhl, E.
Darve, E.
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2015, 102 (12) : 1784 - 1814
[8] An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations
Altinkaynak, Atakan
INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2017, 110 (01) : 57 - 78

← 1 →