PARALLEL MATRIX TRANSPOSE ALGORITHMS ON DISTRIBUTED-MEMORY CONCURRENT COMPUTERS

被引：24

作者：

CHOI, JY

DONGARRA, JJ

WALKER, DW

机构：

[1] OAK RIDGE NATL LAB,MATH SCI SECT,OAK RIDGE,TN 37831

[2] UNIV TENNESSEE,DEPT COMP SCI,KNOXVILLE,TN 37996

来源：

PARALLEL COMPUTING | 1995年 / 21卷 / 09期

关键词：

LINEAR ALGEBRA; MATRIX TRANSPOSE ALGORITHM; DISTRIBUTED MEMORY MULTIPROCESSORS; POINT-TO-POINT COMMUNICATION; INTEL TOUCHSTONE DELTA;

D O I：

10.1016/0167-8191(95)00016-H

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P X Q processor template with a block cyclic data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The communication schemes of the algorithms are determined by the greatest common divisor (GCD) of P and e. If P and Q are relatively prime, the matrix transpose algorithm involves complete exchange communication. If P and P are not relatively prime, processors are divided into GCD groups and the communication operations are overlapped for different groups of processors. Processors transpose GCD wrapped diagonal blocks simultaneously, and the matrix can be transposed with LCM/GCD steps, where LCM is the least common multiple of P and Q. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix muliplication routine, C = A . B, the algorithms are used to compute parallel multiplications of transposed matrices, C = A(T) . B-T, in the PUMMA package [5]. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.

引用

页码：1387 / 1405

页数：19

共 50 条

[31] Scalable parallel matrix multiplication on distributed memory parallel computers
Li, KQ
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (12) : 1709 - 1731
[32] Scalable parallel matrix multiplication on distributed memory parallel computers
Li, Keqin, 2000, IEEE, United States
[33] Parallel H-matrix arithmetic on distributed-memory systems
Izadi, Mohammad
COMPUTING AND VISUALIZATION IN SCIENCE, 2012, 15 (02) : 87 - 97
[34] Cache blocking of distributed-memory parallel matrix power kernels
Lacey, Dane
Alappat, Christie
Lange, Florian
Hager, Georg
Fehske, Holger
Wellein, Gerhard
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2025,
[35] Distributed-Memory Parallel JointNMF
Eswar, Srinivas
Cobb, Benjamin
Hayashi, Koby
Kannan, Ramakrishnan
Ballard, Grey
Vuduc, Richard
Park, Haesun
PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ACM ICS 2023, 2023, : 301 - 312
[36] A FULLY PARALLEL CONDENSATION METHOD FOR GENERALIZED EIGENVALUE PROBLEMS ON DISTRIBUTED-MEMORY COMPUTERS
ROTHE, K
VOSS, H
PARALLEL COMPUTING, 1995, 21 (06) : 907 - 921
[37] Efficient all-to-all broadcast schemes in distributed-memory parallel computers
Oh, ES
Kanj, IA
16TH ANNUAL INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2002, : 71 - 76
[38] Implementation of multiple-precision parallel division and square root on distributed-memory parallel computers
Takahashi, D
2000 INTERNATIONAL WORKSHOPS ON PARALLEL PROCESSING, PROCEEDINGS, 2000, : 229 - 235
[39] A framework for generating distributed-memory parallel programs for block recursive algorithms
Gupta, SKS
Huang, CH
Sadayappan, P
Johnson, RW
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 34 (02) : 137 - 153
[40] Comparison of backfilling algorithms for job scheduling in distributed-memory parallel systems
Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403
Comput. Educ. J., 2007, 4 (22-31):

← 1 2 3 4 5 →