Retargeting sequential image-processing programs for data parallel execution

被引:4
|
作者
Baumstark, LB [1 ]
Wills, LM
机构
[1] State Univ W Georgia, Dept Comp Sci, Carrollton, GA 30118 USA
[2] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Reengineering; SIMD processors; data-level parallelization; explicitly parallel program representation; program recognition;
D O I
10.1109/TSE.2005.26
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
New compact, low-power implementation technologies for processors and imaging arrays can enable a new generation of portable video products. However, software compatibility with large bodies of existing applications written in C prevents more efficient, higher performance data parallel architectures from being used in these embedded products. If this software could be automatically retargeted explicitly for data parallel execution, product designers could incorporate these architectures into embedded products. The key challenge is exposing the parallelism that is inherent in these applications but that is obscured by artifacts imposed by sequential programming languages. This paper presents a recognition-based approach for automatically extracting a data parallel program model from sequential image processing code and retargeting it to data parallel execution mechanisms. The explicitly parallel model presented, called multidimensional data flow ( MDDF), captures a model of how operations on data regions ( e. g., rows, columns, and tiled blocks) are composed and interact. To extract an MDDF model, a partial recognition technique is used that focuses on identifying array access patterns in loops, transforming only those program elements that hinder parallelization, while leaving the core algorithmic computations intact. The paper presents results of retargeting a set of production programs to a representative data parallel processor array to demonstrate the capacity to extract parallelism using this technique. The retargeted applications yield a potential execution throughput limited only by the number of processing elements, exceeding thousands of instructions per cycle in massively parallel implementations.
引用
收藏
页码:116 / 136
页数:21
相关论文
共 50 条