PARALLEL MATRIX TRANSPOSE ALGORITHMS ON DISTRIBUTED-MEMORY CONCURRENT COMPUTERS

被引:24
|
作者
CHOI, JY
DONGARRA, JJ
WALKER, DW
机构
[1] OAK RIDGE NATL LAB,MATH SCI SECT,OAK RIDGE,TN 37831
[2] UNIV TENNESSEE,DEPT COMP SCI,KNOXVILLE,TN 37996
关键词
LINEAR ALGEBRA; MATRIX TRANSPOSE ALGORITHM; DISTRIBUTED MEMORY MULTIPROCESSORS; POINT-TO-POINT COMMUNICATION; INTEL TOUCHSTONE DELTA;
D O I
10.1016/0167-8191(95)00016-H
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P X Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and e. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and P are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix muliplication routine, C = A . B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A(T) . B-T, in the PUMMA package [5]. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.
引用
收藏
页码:1387 / 1405
页数:19
相关论文
共 50 条
  • [41] Nonlinear structural analysis on distributed-memory computers
    Watson, B.C.
    Noor, A.K.
    Computers and Structures, 1996, 58 (02): : 233 - 247
  • [42] IMPLEMENTING AN ODE CODE ON DISTRIBUTED-MEMORY COMPUTERS
    BURRAGE, K
    POHL, B
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1994, 28 (10-12) : 235 - 252
  • [43] Nonlinear structural analysis on distributed-memory computers
    Watson, BC
    Noor, AK
    COMPUTERS & STRUCTURES, 1996, 58 (02) : 233 - 247
  • [44] Distributed-Memory Algorithms for Maximal Cardinality Matching using Matrix Algebra
    Azad, Ariful
    Buluc, Aydin
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 398 - 407
  • [45] Parallel ILP for distributed-memory architectures
    Nuno A. Fonseca
    Ashwin Srinivasan
    Fernando Silva
    Rui Camacho
    Machine Learning, 2009, 74 : 257 - 279
  • [46] Parallel algorithms for bipartite matching problems on distributed memory computers
    Langguth, Johannes
    Patwary, Md. Mostofa Ali
    Manne, Fredrik
    PARALLEL COMPUTING, 2011, 37 (12) : 820 - 845
  • [47] PARALLEL ANNEALING ON DISTRIBUTED-MEMORY SYSTEMS
    LEE, FH
    STILES, GS
    SWAMINATHAN, V
    PROGRAMMING AND COMPUTER SOFTWARE, 1995, 21 (01) : 1 - 8
  • [48] Efficient algorithms for data distribution on distributed memory parallel computers
    Lee, PZ
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (08) : 825 - 839
  • [49] Numerical simulations of polymer flooding process in porous media on distributed-memory parallel computers
    Zhong, He
    Liu, Hui
    Cui, Tao
    Chen, Zhangxin
    Shen, Lihua
    Yang, Bo
    He, Ruijian
    Guo, Xiaohu
    JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 400 (400)
  • [50] The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers
    Gupta, A
    Gustavson, FG
    Joshi, M
    Toledo, S
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1998, 24 (01): : 74 - 101