Architecture scalability of parallel vector computers with a shared memory

被引：2

作者：

Dekker, E ^{[1
]}

机构：

[1] Delft Univ Technol, Fac Informat Technol & Syst, NL-2628 CD Delft, Netherlands

来源：

IEEE TRANSACTIONS ON COMPUTERS | 1998年 / 47卷 / 05期

关键词：

architecture scalability; parallel vector computers; shared memory; sustainable peak performance; theoretical peak performance;

D O I：

10.1109/12.677257

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Based on a model of a parallel vector computer with a shared memory, its scalability properties are derived. The processor-memory interconnection network is assumed to be composed of crossbar switches of size b x b. This paper analyzes sustainable peak performance under optimal conditions, i.e., no memory bank conflicts, sufficient processor-memory bank pathways, and no interconnection network conflicts. It will be shown that, with fully vectorizable algorithms and no communication overhead, the sustainable peak performance does not scale up linearly with the number of processors p, If the interconnection network is unbuffered, the number of memory banks must increase at least with O(p log(b) p) to sustain peak performance. If the network is buffered, this bottleneck can be alleviated; however, the half performance vector length still increases with O(log(b) p). The paper confirms the validity of the model by examining the performance behavior of the LINPACK benchmark.

引用

页码：614 / 624

页数：11

共 50 条

[41] A new parallel DSP with short-vector memory architecture
Fridman, J
Anderson, WC
ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 2139 - 2142
[42] Improved SSOR and Incomplete Cholesky Solution of Linear Equations on Shared Memory and Distributed Memory Parallel Computers
Joubert, Wayne
Oppe, Thomas
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 1994, 1 (03) : 287 - 311
[43] Optimizing a Parallel Video Encoder with Message Passing and a Shared Memory Architecture
谷俊丽
孙义和
TsinghuaScienceandTechnology, 2011, 16 (04) : 393 - 398
[44] A parallel system architecture based on dynamically configurable shared memory clusters
Tudruj, M
Masko, L
PARALLEL PROCESSING APPLIED MATHEMATICS, 2002, 2328 : 51 - 61
[45] PERFORMANCE ANALYSIS OF THE FFT ALGORITHM ON A SHARED-MEMORY PARALLEL ARCHITECTURE
CVETANOVIC, Z
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 1987, 31 (04) : 435 - 451
[46] Implementation of a primal–dual method for SDP on a shared memory parallel architecture
Brian Borchers
Joseph G. Young
Computational Optimization and Applications, 2007, 37 : 355 - 369
[47] Optimizing a parallel video encoder with message passing and a shared memory architecture
Gu J.
Sun Y.
Tsinghua Science and Technology, 2011, 16 (04) : 393 - 398
[48] SHARED MEMORY, VECTORS, MESSAGE PASSING, AND SCALABILITY
SMITH, BJ
LECTURE NOTES IN COMPUTER SCIENCE, 1988, 295 : 29 - 34
[49] Time-accurate implicit ALE algorithm for shared-memory parallel computers
Sharov, D
Luo, H
Baum, JD
Löhner, R
COMPUTATIONAL FLUID DYNAMICS 2000, 2001, : 387 - 392
[50] The computational complexity of the Quadrant Interlocking (QI) iterative methods on shared memory parallel computers
Evans, DJ
Abdullah, R
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1998, 67 (3-4) : 391 - 410

← 1 2 3 4 5 →