Technology growth has produced computing environments that make it feasible to attack demanding scientific applications on a larger scale. Innovative applications like text recognition and image processing rely on computationally intensive operations requiring massive parallelism (for example, large-matrix multiplication, feature extraction, and cluster analysis). Systolic arrays are ideally suited for computationally intensive applications. Falling into an area between vector computers and massively parallel computers, systolic arrays typically combine intensive local communication and computation with decentralized parallelism in a compact package. This article chronicles the extension of systolic array architecture from fixed- or special-purpose architectures to general-purpose, SIMD (single-instruction stream, multiple-data stream) and MIMD (multiple-instruction stream, multiple-data stream) architectures, and, more recently, to hybrid architectures that combine both commercial and FPGA (field-programmable gate array) technologies. The authors present a taxonomy for systolic organizations (special purpose, programmable, reconfigurable, and hybrid), discuss each architecture's methods of exploiting concurrencies, and compare performance attributes of each. The authors also describe a number of implementation issues that determine a systolic array's performance efficiency (algorithms and mapping, system integration through memory subsystems, cell granularity, and extensibility to a wide variety of topologies, among others). The authors predict that, with technological advances, future systolic architectures will be based on reconfigurable FPGA architecture. They argue that general-purpose systolic arrays cannot be overlooked as a solution to the intensive computational performance requirements of tomorrow's applications.