A parametrized loop fusion algorithm for improving parallelism and cache locality

被引:27
|
作者
Singhai, SK [1 ]
McKinley, KS [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
来源
COMPUTER JOURNAL | 1997年 / 40卷 / 06期
关键词
D O I
10.1093/comjnl/40.6.340
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can increase data locality and the granularity of parallel loops, thus improving program performance, Previous approaches to this problem have looked at these two benefits in isolation, In this work, we propose a new model which considers data locality, parallelism and register pressure together, We build a weighted directed acyclic graph in which the nodes represent program loops along with their register pressure, and the edges represent the amount of locality and parallelism present. The direction of an edge represents an execution order constraint. We then partition the graph into components such that the sum of the weights on the edges cut is minimized, subject to the constraint that the nodes in the same partition can be safely fused together, and the register pressure of the combined loop does not exceed the number of available registers. Previous work demonstrates that the general problem of finding optimal partitions is NP-hard, In restricted cases, we show that it is possible to arrive at the optimal solution. We give an algorithm for the restricted case and a heuristic for the general case. We demonstrate the effectiveness of fusion and our approach with experimental results.
引用
收藏
页码:340 / 355
页数:16
相关论文
共 50 条
  • [1] Aggressive loop fusion for improving locality and parallelism
    Xue, JL
    PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, 2005, 3758 : 224 - 238
  • [2] With-loop fusion for data locality and parallelism
    Grelck, Clemens
    Hinckfuss, Karsten
    Scholz, Sven-Bodo
    IMPLEMENTATION AND APPLICATION OF FUNCTIONAL LANGUAGES, 2006, 4015 : 178 - +
  • [3] Improving cache locality by a combination of loop and data transformations
    Kandemir, M
    Ramanujam, J
    Choudhary, A
    IEEE TRANSACTIONS ON COMPUTERS, 1999, 48 (02) : 159 - 167
  • [4] Improving the parallelism of iterative methods by aggressive loop fusion
    Xue, Jingling
    Guo, Minyi
    Wei, Daming
    JOURNAL OF SUPERCOMPUTING, 2008, 43 (02): : 147 - 164
  • [5] Fusion of loops for parallelism and locality
    Manjikian, N
    Abdelrahman, TS
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 1997, 8 (02) : 193 - 209
  • [6] Improving the parallelism of iterative methods by aggressive loop fusion
    Jingling Xue
    Minyi Guo
    Daming Wei
    The Journal of Supercomputing, 2008, 43 : 147 - 164
  • [7] Improving Last Level Cache Locality by Integrating Loop and Data Transformations
    Ding, Wei
    Kandemir, Mahmut
    2012 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2012, : 65 - 72
  • [8] Loop tiling for optimization of locality and parallelism
    Liu, Song
    Wu, Weiguo
    Zhao, Bo
    Jiang, Qing
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (05): : 1160 - 1176
  • [9] Improving Parallelism and Locality with Asynchronous Algorithms
    Liu, Lixia
    Li, Zhiyuan
    ACM SIGPLAN NOTICES, 2010, 45 (05) : 213 - 222
  • [10] Improving Parallelism and Locality with Asynchronous Algorithms
    Liu, Lixia
    Li, Zhiyuan
    PPOPP 2010: PROCEEDINGS OF THE 2010 ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING, 2010, : 213 - 222