Localized algorithms for VLSI processor arrays

被引:0
|
作者
Evans, DJ [1 ]
Gusev, M
机构
[1] Univ Technol Loughborough, Parallel Algorithms Res Ctr, Loughborough, Leics, England
[2] Univ Kiril & Metodij Skopje, PMF Inst Informat, Skopje 91000, North Macedonia
关键词
computational broadcast elimination; data broadcast elimination; data dependence; algorithm transformation; linear insertion and bubble sort; QR decomposition algorithm;
D O I
10.1080/00207160008804974
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this paper we analyze the algorithms expressed as a system of recurrence equations. The algorithms are called 2*1 output algorithms if two output values of one function (variable identification) are specified by the system of recurrence equations for each index point in the algorithm. The algorithm is in free form if the indexes of these two values are not dependent. Two standard classes are determined by this criteria: the nearest neighbour and the all pair form. For example the sorting algorithm can be expressed in the all pair form i.e., the linear insertion algorithm or in the nearest neighbour form i.e., the bubble sort algorithm. However these algorithms are different in their nature. A procedure to eliminate the computational broadcast for the all pair 2*1 output algorithm has been proposed by the authors in [1]. The result obtained by implementing this procedure was a localized form of the algorithm and a system of uniform recurrence equations by eliminating the computational and data broadcast. So the data dependence method can be efficiently used for parallel implementations. The proposed procedure cannot be implemented directly on the nearest neighbour form algorithms. Here we show how the algorithm can be restructured into a form where the computational and data broadcast can be eliminated. These transformations result in localized algorithms. A few examples show how these algorithms can be implemented on processor arrays. For example, the Gentleman Kung triangular array [2] can be used for solving the QR decomposition algorithm for both forms of the algorithm. The implementations differ in the order of the data flow and the processor operation. We show that the implementation of the nearest neighbour algorithm is even better than the standard one.
引用
收藏
页码:149 / 166
页数:18
相关论文
共 50 条
  • [31] OPTIMAL ROUTING ALGORITHMS FOR MESH-CONNECTED PROCESSOR ARRAYS
    RAJASEKARAN, S
    TSANTILAS, T
    ALGORITHMICA, 1992, 8 (01) : 21 - 38
  • [32] Parallel reconfiguration algorithms for mesh-connected processor arrays
    Jigang Wu
    Guiyuan Jiang
    Yuze Shen
    Siew-Kei Lam
    Jizhou Sun
    Thambipillai Srikanthan
    The Journal of Supercomputing, 2014, 69 : 610 - 628
  • [33] PARTITIONED ALGORITHMS FOR GAUSSIAN-ELIMINATION ON RECONFIGURABLE PROCESSOR ARRAYS
    MARESCA, M
    CARRAVIERI, G
    CORNARA, G
    FRISIANI, AL
    MICROPROCESSING AND MICROPROGRAMMING, 1990, 30 (1-5): : 153 - 158
  • [34] Scheduling of partitioned regular algorithms on processor arrays with constrained resources
    Teich, J
    Thiele, L
    Zhang, L
    INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS 1996, PROCEEDINGS, 1996, : 131 - 144
  • [35] VLSI algorithms, architectures, and implementation of a versatile GF(2m) processor
    Hasan, MA
    Wassal, AG
    IEEE TRANSACTIONS ON COMPUTERS, 2000, 49 (10) : 1064 - 1073
  • [36] VLSI PROCESSOR ARCHITECTURE
    HENNESSY, JL
    IEEE TRANSACTIONS ON COMPUTERS, 1984, 33 (12) : 1221 - 1246
  • [37] VLSI PROCESSOR ARCHITECTURES
    TRELEAVEN, PC
    COMPUTER, 1982, 15 (06) : 33 - 45
  • [38] THE RISE OF THE VLSI PROCESSOR
    WILKES, MV
    COMMUNICATIONS OF THE ACM, 1990, 33 (12) : 16 - &
  • [39] Resource-efficient reconfiguration algorithm of VLSI 2-D processor arrays
    Kim, Jung H.
    Rhee, Phill K.
    Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 1992, 4 (04): : 317 - 330
  • [40] A WAVE-FRONT ALGORITHM FOR LU DECOMPOSITION OF A PARTITIONED MATRIX ON VLSI PROCESSOR ARRAYS
    ONAGA, K
    TAKECHI, T
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1986, 3 (02) : 158 - 182