A parametrized loop fusion algorithm for improving parallelism and cache locality

被引:27
|
作者
Singhai, SK [1 ]
McKinley, KS [1 ]
机构
[1] Univ Massachusetts, Dept Comp Sci, Amherst, MA 01003 USA
来源
COMPUTER JOURNAL | 1997年 / 40卷 / 06期
关键词
D O I
10.1093/comjnl/40.6.340
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Loop fusion is a reordering transformation that merges multiple loops into a single loop. It can increase data locality and the granularity of parallel loops, thus improving program performance, Previous approaches to this problem have looked at these two benefits in isolation, In this work, we propose a new model which considers data locality, parallelism and register pressure together, We build a weighted directed acyclic graph in which the nodes represent program loops along with their register pressure, and the edges represent the amount of locality and parallelism present. The direction of an edge represents an execution order constraint. We then partition the graph into components such that the sum of the weights on the edges cut is minimized, subject to the constraint that the nodes in the same partition can be safely fused together, and the register pressure of the combined loop does not exceed the number of available registers. Previous work demonstrates that the general problem of finding optimal partitions is NP-hard, In restricted cases, we show that it is possible to arrive at the optimal solution. We give an algorithm for the restricted case and a heuristic for the general case. We demonstrate the effectiveness of fusion and our approach with experimental results.
引用
收藏
页码:340 / 355
页数:16
相关论文
共 50 条
  • [31] Improving locality using loop and data transformations in an integrated framework
    Kandemir, M
    Choudhary, A
    Ramanujam, J
    Banerjee, P
    31ST ANNUAL ACM/IEEE INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, PROCEEDINGS, 1998, : 285 - 296
  • [32] From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization
    Qiao, Bo
    Reiche, Oliver
    Hannig, Frank
    Teich, Juergen
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 242 - 253
  • [33] LACS: A Locality-Aware Cost-Sensitive Cache Replacement Algorithm
    Kharbutli, Mazen
    Sheikh, Rami
    IEEE TRANSACTIONS ON COMPUTERS, 2014, 63 (08) : 1975 - 1987
  • [34] BCD: To Achieve the Theoretical Optimum of Spatial Locality Based Cache Replacement Algorithm
    Zhu Xu-dong
    Ke Jian
    Xu Lu
    NAS: 2009 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE, AND STORAGE, 2009, : 269 - +
  • [35] Improving Counting Sort Algorithm Via Data Locality
    Mahmud, Shamsed
    Haque, Sardar Anisul
    Choudhury, Nazim
    ACMSE 2022: PROCEEDINGS OF THE 2022 ACM SOUTHEAST CONFERENCE, 2022, : 211 - 214
  • [36] Behavior Aware Data Placement for Improving Cache Line Level Locality in Cloud Computing
    Wang, Jianjun
    Jia, Gangyong
    Li, Aohan
    Han, Guangjie
    Shu, Lei
    JOURNAL OF INTERNET TECHNOLOGY, 2015, 16 (04): : 705 - 716
  • [37] CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning
    Ghazali, Rana
    Adabi, Sahar
    Rezaee, Ali
    Down, Douglas G.
    Movaghar, Ali
    JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS, 2022, 11 (01):
  • [38] CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning
    Rana Ghazali
    Sahar Adabi
    Ali Rezaee
    Douglas G. Down
    Ali Movaghar
    Journal of Cloud Computing, 11
  • [39] A Cache Replacement Algorithm for Improving of Service Response Time
    Nagata, Tomokazu
    Taniguchi, Yuji
    Tamaki, Shiro
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2009, 12 (06): : 1321 - 1333
  • [40] A fast, cache-aware algorithm for the calculation of radiological paths exploiting subword parallelism
    Christiaens, M
    De Sutter, B
    De Bosschere, K
    Van Campenhout, J
    Lemahieu, I
    JOURNAL OF SYSTEMS ARCHITECTURE, 1999, 45 (10) : 781 - 790