Fast and small short vector SIMD matrix multiplication kernels for the synergistic processing element of the CELL processor

被引:0
|
作者
Alvaro, Wesley [1 ]
Kurzak, Jakub [1 ]
Dongarra, Jack [1 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
来源
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance. In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crutial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic Processing Element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C = C - A x B-T operation and the C = C - A x B operation for matrices of size 64 x 64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80 percent of the peak, using as little as 5.9 KB of storage for code and auxiliary data structures.
引用
收藏
页码:935 / 944
页数:10
相关论文
共 8 条
  • [1] Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor
    Kurzak, Jakub
    Alvaro, Wesley
    Dongarra, Jack
    PARALLEL COMPUTING, 2009, 35 (03) : 138 - 150
  • [2] Fast Sparse Matrix-Vector Multiplication on Graphics Processing Unit for Finite Element Analysis
    Ahamed, Abal-Kassim Cheik
    Magoules, Frederic
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1307 - 1314
  • [3] The vector fixed point unit of the synergistic processor element of the cell architecture processor
    Mäding, N
    Leenstra, J
    Pille, J
    Sautter, R
    Büttner, S
    Ehrenreich, S
    Haller, W
    ESSCIRC 2005: PROCEEDINGS OF THE 31ST EUROPEAN SOLID-STATE CIRCUITS CONFERENCE, 2005, : 203 - 206
  • [4] The vector floating-point unit in a synergistic processor element of a CELL processor
    Mueller, SM
    Jacobi, C
    Oh, HJ
    Tran, KD
    Cottier, SR
    Michael, BW
    Nishikawa, H
    Totsuka, Y
    Namatame, T
    Yano, N
    Machida, T
    Dhong, SH
    17th IEEE Symposium on Computer Arithmetic, Proceedings, 2005, : 59 - 67
  • [5] The Vector Fixed Point Unit of the synergistic processor element of the cell architecture processor
    Maeding, N.
    Leenstra, J.
    Pille, J.
    Sautter, R.
    Buettner, S.
    Ehrenreich, S.
    Haller, W.
    2006 DESIGN AUTOMATION AND TEST IN EUROPE, VOLS 1-3, PROCEEDINGS, 2006, : 1579 - +
  • [6] Finite-Element Sparse Matrix Vector Multiplication on Graphic Processing Units
    Dehnavi, Maryam Mehri
    Fernandez, David M.
    Giannacopoulos, Dennis
    IEEE TRANSACTIONS ON MAGNETICS, 2010, 46 (08) : 2982 - 2985
  • [7] A new sparse matrix vector multiplication graphics processing unit algorithm designed for finite element problems
    Wong, J.
    Kuhl, E.
    Darve, E.
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2015, 102 (12) : 1784 - 1814
  • [8] An efficient sparse matrix-vector multiplication on CUDA-enabled graphic processing units for finite element method simulations
    Altinkaynak, Atakan
    INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING, 2017, 110 (01) : 57 - 78