A parametrized loop fusion algorithm for improving parallelism and cache locality

被引：27

作者：

Singhai, SK ^{[1
]}

McKinley, KS ^{[1
]}

机构：

[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA

来源：

COMPUTER JOURNAL | 1997年 / 40卷 / 06期

关键词：

D O I：

10.1093/comjnl/40.6.340

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can increase data locality and the granularity of parallel loops, thus improving program performance, Previous approaches to this problem have looked at these two benefits in isolation, In this work, we propose a new model which considers data locality, parallelism and register pressure together, We build a weighted directed acyclic graph in which the nodes represent program loops along with their register pressure, and the edges represent the amount of locality and parallelism present. The direction of an edge represents an execution order constraint. We then partition the graph into components such that the sum of the weights on the edges cut is minimized, subject to the constraint that the nodes in the same partition can be safely fused together, and the register pressure of the combined loop does not exceed the number of available registers. Previous work demonstrates that the general problem of finding optimal partitions is NP-hard, In restricted cases, we show that it is possible to arrive at the optimal solution. We give an algorithm for the restricted case and a heuristic for the general case. We demonstrate the effectiveness of fusion and our approach with experimental results.

引用

页码：340 / 355

页数：16

共 50 条

[41] A cache-conscious profitability model for empirical tuning of loop fusion
Qasem, Apan
Kennedy, Ken
LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2006, 4339 : 106 - +
[42] Enabling loop fusion and tiling for cache performance by fixing fusion-preventing data dependences
Xue, JL
Huang, QG
2005 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSSING, PROCEEDINGS, 2005, : 107 - 115
[43] Improving cache global consistency and hit ratio in dependency objects with semantic spatial locality correlations
Department of Commercial Technology and Management, Ling Tung University, No. 1, Lingdong Rd., Nantun District, Taichung City 408, Taiwan
不详
WSEAS Trans. Inf. Sci. Appl., 2009, 4 (647-659):
[44] Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
Yang, Xuejun
Du, Jing
Yan, Xiaobo
Deng, Yu
JOURNAL OF SUPERCOMPUTING, 2009, 47 (02): : 171 - 197
[45] Improving Networked File System Performance Using a Locality-Aware Cooperative Cache Protocol
Jiang, Song
Zhang, Xuechen
Liang, Shuang
Davis, Kei
IEEE TRANSACTIONS ON COMPUTERS, 2010, 59 (11) : 1508 - 1519
[46] Matrix-based streamization approach for improving locality and parallelism on FT64 stream processor
Xuejun Yang
Jing Du
Xiaobo Yan
Yu Deng
The Journal of Supercomputing, 2009, 47 : 171 - 197
[47] Improving a Genetic Algorithm for Route Planning Using Parallelism with Speculative Execution
Mathias, H. David
Foley, Samantha S.
PEARC '19: PROCEEDINGS OF THE PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING ON RISE OF THE MACHINES (LEARNING), 2019,
[48] Parallel Algorithm Core: A Novel IPSec Algorithm Engine for Both Exploiting Parallelism and Improving Scalability
程东年
胡宇翔
刘彩霞
JournalofComputerScience&Technology, 2008, (05) : 792 - 805
[49] Parallel Algorithm Core: A Novel IPSec Algorithm Engine for Both Exploiting Parallelism and Improving Scalability
Dong-Nian Cheng
Yu-Xiang Hu
Cai-Xia Liu
Journal of Computer Science and Technology, 2008, 23 : 792 - 805
[50] Parallel algorithm core: A novel IPSec algorithm engine for both exploiting parallelism and improving scalability
Cheng, Dong-Nian
Hi, Yu-Xiang
Liu, Cai-Xia
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2008, 23 (05) : 792 - 805

← 1 2 3 4 5 →