Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor

被引:35
|
作者
Kurzak, Jakub [1 ]
Alvaro, Wesley [1 ]
Dongarra, Jack [1 ,2 ,3 ,4 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
[2] Oak Ridge Natl Lab, Div Math & Comp Sci, Oak Ridge, TN USA
[3] Univ Manchester, Sch Math, Manchester, NH USA
[4] Univ Manchester, Sch Comp Sci, Manchester, NH USA
关键词
Instruction level parallelism; Single Instruction Multiple Data; Synergistic Processing Element; Loop optimizations; Vectorization; LINEAR-EQUATIONS; SOLVING SYSTEMS;
D O I
10.1016/j.parco.2008.12.010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigen-value computations. The STI CELL processor exceeds the capabilities of any other processor available today in terms of peak single precision, floating point performance, aside from special purpose accelerators like Graphics Processing Units (GPUs). In order to fully exploit the potential of the CELL processor for a wide range of numerical algorithms, fast implementation of the matrix multiplication operation is essential. The crucial component is the matrix multiplication kernel crafted for the short vector Single Instruction Multiple Data architecture of the Synergistic Processing Element of the CELL processor. In this paper, single precision matrix multiplication kernels are presented implementing the C = C - A x B-T operation and the C = C - A x B operation for matrices of size 64 x 64 elements. For the latter case, the performance of 25.55 Gflop/s is reported, or 99.80% of the peak, using as little as 5.9 kB of storage for code and auxiliary data structures. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:138 / 150
页数:13
相关论文
共 50 条
  • [21] Towards a Universal FPGA Matrix-Vector Multiplication Architecture
    Kestur, Srinidhi
    Davis, John D.
    Chung, Eric S.
    2012 IEEE 20TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2012, : 9 - 16
  • [22] VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors
    Li, Yishui
    Xie, Peizhen
    Chen, Xinhai
    Liu, Jie
    Yang, Bo
    Li, Shengguo
    Gong, Chunye
    Gan, Xinbiao
    Xu, Han
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (03): : 2063 - 2081
  • [23] The study of impact of matrix-processor mapping on the parallel sparse matrix-vector multiplication
    Simecek, I.
    Langr, D.
    Srnec, E.
    2013 15TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2013), 2014, : 321 - 328
  • [24] Modern Generative Programming for Optimizing Small Matrix-Vector Multiplication
    Penuchot, Jules
    Falcou, Joel
    Khabou, Amal
    PROCEEDINGS 2018 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2018, : 508 - 514
  • [25] Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor
    Wang, Li
    Yang, Xue Jun
    Bin Wang, Gui
    Yan, Xiao Bo
    Deng, Yu
    Du, Jing
    Zhang, Ying
    Tang, Tao
    Zeng, Kun
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2007, 4742 : 44 - 55
  • [26] FIBEROPTIC SIGNAL PROCESSOR WITH APPLICATIONS TO MATRIX-VECTOR MULTIPLICATION AND LATTICE FILTERING
    TUR, M
    GOODMAN, JW
    MOSLEHI, B
    BOWERS, JE
    SHAW, HJ
    OPTICS LETTERS, 1982, 7 (09) : 463 - 465
  • [27] Experiences with the Sparse Matrix-Vector Multiplication on a Many-core Processor
    Pichel, Juan C.
    Rivera, Francisco F.
    2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 7 - 15
  • [28] TIME EFFICIENT SYSTOLIC ARCHITECTURE FOR MATRIX-STAR-VECTOR MULTIPLICATION
    ZUBAIR, M
    MADAN, BB
    INFORMATION PROCESSING LETTERS, 1987, 24 (04) : 225 - 231
  • [29] Charge-mode parallel architecture for matrix-vector multiplication
    Genov, R
    Cauwenberghs, G
    PROCEEDINGS OF THE 43RD IEEE MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 2000, : 506 - 509
  • [30] Modular and Lean Architecture with Elasticity for Sparse Matrix Vector Multiplication on FPGAs
    Jain, Abhishek Kumar
    Ravishankar, Chirag
    Omidian, Hossein
    Kumar, Sharan
    Kulkarni, Maithilee
    Tripathi, Aashish
    Gaitonde, Dinesh
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 133 - 143