PARALLEL MATRIX TRANSPOSE ALGORITHMS ON DISTRIBUTED-MEMORY CONCURRENT COMPUTERS

被引：24

作者：

CHOI, JY

DONGARRA, JJ

WALKER, DW

机构：

[1] OAK RIDGE NATL LAB,MATH SCI SECT,OAK RIDGE,TN 37831

[2] UNIV TENNESSEE,DEPT COMP SCI,KNOXVILLE,TN 37996

来源：

PARALLEL COMPUTING | 1995年 / 21卷 / 09期

关键词：

LINEAR ALGEBRA; MATRIX TRANSPOSE ALGORITHM; DISTRIBUTED MEMORY MULTIPROCESSORS; POINT-TO-POINT COMMUNICATION; INTEL TOUCHSTONE DELTA;

D O I：

10.1016/0167-8191(95)00016-H

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P X Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and e. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and P are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix muliplication routine, C = A . B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A(T) . B-T, in the PUMMA package [5]. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.

引用

页码：1387 / 1405

页数：19

共 50 条

[41] Nonlinear structural analysis on distributed-memory computers
Watson, B.C.
Noor, A.K.
Computers and Structures, 1996, 58 (02): : 233 - 247
[42] IMPLEMENTING AN ODE CODE ON DISTRIBUTED-MEMORY COMPUTERS
BURRAGE, K
POHL, B
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1994, 28 (10-12) : 235 - 252
[43] Nonlinear structural analysis on distributed-memory computers
Watson, BC
Noor, AK
COMPUTERS & STRUCTURES, 1996, 58 (02) : 233 - 247
[44] Distributed-Memory Algorithms for Maximal Cardinality Matching using Matrix Algebra
Azad, Ariful
Buluc, Aydin
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 398 - 407
[45] Parallel ILP for distributed-memory architectures
Nuno A. Fonseca
Ashwin Srinivasan
Fernando Silva
Rui Camacho
Machine Learning, 2009, 74 : 257 - 279
[46] Parallel algorithms for bipartite matching problems on distributed memory computers
Langguth, Johannes
Patwary, Md. Mostofa Ali
Manne, Fredrik
PARALLEL COMPUTING, 2011, 37 (12) : 820 - 845
[47] PARALLEL ANNEALING ON DISTRIBUTED-MEMORY SYSTEMS
LEE, FH
STILES, GS
SWAMINATHAN, V
PROGRAMMING AND COMPUTER SOFTWARE, 1995, 21 (01) : 1 - 8
[48] Efficient algorithms for data distribution on distributed memory parallel computers
Lee, PZ
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (08) : 825 - 839
[49] Numerical simulations of polymer flooding process in porous media on distributed-memory parallel computers
Zhong, He
Liu, Hui
Cui, Tao
Chen, Zhangxin
Shen, Lihua
Yang, Bo
He, Ruijian
Guo, Xiaohu
JOURNAL OF COMPUTATIONAL PHYSICS, 2020, 400 (400)
[50] The design, implementation, and evaluation of a symmetric banded linear solver for distributed-memory parallel computers
Gupta, A
Gustavson, FG
Joshi, M
Toledo, S
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1998, 24 (01): : 74 - 101

← 1 2 3 4 5 →