Localized algorithms for VLSI processor arrays

被引：0

作者：

Evans, DJ ^{[1
]}

Gusev, M

机构：

[1] Univ Technol Loughborough, Parallel Algorithms Res Ctr, Loughborough, Leics, England

[2] Univ Kiril & Metodij Skopje, PMF Inst Informat, Skopje 91000, North Macedonia

来源：

INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS | 2000年 / 75卷 / 02期

关键词：

computational broadcast elimination; data broadcast elimination; data dependence; algorithm transformation; linear insertion and bubble sort; QR decomposition algorithm;

D O I：

10.1080/00207160008804974

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

In this paper we analyze the algorithms expressed as a system of recurrence equations. The algorithms are called 2*1 output algorithms if two output values of one function (variable identification) are specified by the system of recurrence equations for each index point in the algorithm. The algorithm is in free form if the indexes of these two values are not dependent. Two standard classes are determined by this criteria: the nearest neighbour and the all pair form. For example the sorting algorithm can be expressed in the all pair form i.e., the linear insertion algorithm or in the nearest neighbour form i.e., the bubble sort algorithm. However these algorithms are different in their nature. A procedure to eliminate the computational broadcast for the all pair 2*1 output algorithm has been proposed by the authors in [1]. The result obtained by implementing this procedure was a localized form of the algorithm and a system of uniform recurrence equations by eliminating the computational and data broadcast. So the data dependence method can be efficiently used for parallel implementations. The proposed procedure cannot be implemented directly on the nearest neighbour form algorithms. Here we show how the algorithm can be restructured into a form where the computational and data broadcast can be eliminated. These transformations result in localized algorithms. A few examples show how these algorithms can be implemented on processor arrays. For example, the Gentleman Kung triangular array [2] can be used for solving the QR decomposition algorithm for both forms of the algorithm. The implementations differ in the order of the data flow and the processor operation. We show that the implementation of the nearest neighbour algorithm is even better than the standard one.

引用

页码：149 / 166

页数：18

共 50 条

[31] OPTIMAL ROUTING ALGORITHMS FOR MESH-CONNECTED PROCESSOR ARRAYS
RAJASEKARAN, S
TSANTILAS, T
ALGORITHMICA, 1992, 8 (01) : 21 - 38
[32] Parallel reconfiguration algorithms for mesh-connected processor arrays
Jigang Wu
Guiyuan Jiang
Yuze Shen
Siew-Kei Lam
Jizhou Sun
Thambipillai Srikanthan
The Journal of Supercomputing, 2014, 69 : 610 - 628
[33] PARTITIONED ALGORITHMS FOR GAUSSIAN-ELIMINATION ON RECONFIGURABLE PROCESSOR ARRAYS
MARESCA, M
CARRAVIERI, G
CORNARA, G
FRISIANI, AL
MICROPROCESSING AND MICROPROGRAMMING, 1990, 30 (1-5): : 153 - 158
[34] Scheduling of partitioned regular algorithms on processor arrays with constrained resources
Teich, J
Thiele, L
Zhang, L
INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS 1996, PROCEEDINGS, 1996, : 131 - 144
[35] VLSI algorithms, architectures, and implementation of a versatile GF(2m) processor
Hasan, MA
Wassal, AG
IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (10) : 1064 - 1073
[36] VLSI PROCESSOR ARCHITECTURE
HENNESSY, JL
IEEE TRANSACTIONS ON COMPUTERS, 1984, 33 (12) : 1221 - 1246
[37] VLSI PROCESSOR ARCHITECTURES
TRELEAVEN, PC
COMPUTER, 1982, 15 (06) : 33 - 45
[38] THE RISE OF THE VLSI PROCESSOR
WILKES, MV
COMMUNICATIONS OF THE ACM, 1990, 33 (12) : 16 - &
[39] Resource-efficient reconfiguration algorithm of VLSI 2-D processor arrays
Kim, Jung H.
Rhee, Phill K.
Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 1992, 4 (04): : 317 - 330
[40] A WAVE-FRONT ALGORITHM FOR LU DECOMPOSITION OF A PARTITIONED MATRIX ON VLSI PROCESSOR ARRAYS
ONAGA, K
TAKECHI, T
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1986, 3 (02) : 158 - 182

← 1 2 3 4 5 →