Loop tiling for optimization of locality and parallelism

被引:0
|
作者
Liu, Song [1 ]
Wu, Weiguo [1 ]
Zhao, Bo [1 ]
Jiang, Qing [1 ]
机构
[1] School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an,710049, China
关键词
Economic and social effects - Memory architecture - Optimal systems - Codes (symbols) - Ion beams - Iterative methods;
D O I
10.7544/issn1000-1239.2015.20131387
中图分类号
学科分类号
摘要
Loop tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality in modern computer architecture. It is mainly divided into two categories: fixed and parameterized. These two types of tiling technologies are systematically summarized and their advantages and disadvantages are analyzed comprehensively. Since the tile size would significantly affect the performance of the tiled code, various methods of optimal tile size selection are described. Besides, various kinds of technologies applied to multi-level tiling, parallelism exploration and imperfectly nested loops are surveyed in this paper. Based on the detailed analysis of the current researches on loop tiling technologies, several conclusions are drawn as follows: 1) How to balance the trade-off between computation complexity and generation efficiency of tiled code has not been completely solved, and how to use loop boundaries to efficiently bound the iteration spaces for data locality enhancement also needs further study. 2) Optimal tile size selection is still a difficult and open question, and it would be significant to understand the influence of different level tile size in hierarchical memory system on performance. 3) From the perspective of application, how to automatically generate effective tiled code for arbitrarily nested loops needs further research. On the other hand, how to take full advantage of shared hierarchical memory and multi-core architectures to achieve high degree of parallelism for tiled code is another interesting direction. ©, 2015, Science Press. All right reserved.
引用
收藏
页码:1160 / 1176
相关论文
共 50 条
  • [1] Aggressive loop fusion for improving locality and parallelism
    Xue, JL
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, 2005, 3758 : 224 - 238
  • [2] With-loop fusion for data locality and parallelism
    Grelck, Clemens
    Hinckfuss, Karsten
    Scholz, Sven-Bodo
    [J]. IMPLEMENTATION AND APPLICATION OF FUNCTIONAL LANGUAGES, 2006, 4015 : 178 - +
  • [3] NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION
    Parsa, Saeed
    Hamzei, Mohammad
    [J]. COMPUTING AND INFORMATICS, 2017, 36 (03) : 566 - 596
  • [4] Loop-synthesizing transformation for maintaining parallelism and enhancing locality
    Lee, S
    Aso, H
    [J]. 2003 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, PROCEEDINGS, 2003, : 156 - 163
  • [5] A parametrized loop fusion algorithm for improving parallelism and cache locality
    Singhai, SK
    McKinley, KS
    [J]. COMPUTER JOURNAL, 1997, 40 (06): : 340 - 355
  • [6] Exposing Parallelism and Locality in a Runtime Parallel Optimization Framework
    Penry, David A.
    Richins, Daniel J.
    Harris, Tyler S.
    Greenland, David
    Rehme, Koy D.
    [J]. PROCEEDINGS OF THE 2010 COMPUTING FRONTIERS CONFERENCE (CF 2010), 2010, : 117 - 118
  • [7] Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations
    Bondhugula, Uday
    Bandishti, Vinayaka
    Pananilath, Irshad
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1285 - 1298
  • [8] Hexagonal Loop Tiling for Jacobi Computation Optimization Method
    Qu, Bin
    Liu, Song
    Zhang, Zeng-Yuan
    Ma, Jie
    Wu, Wei-Guo
    [J]. Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3721 - 3738
  • [9] Data locality and parallelism optimization using a constraint-based approach
    Ozturk, Ozcan
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (02) : 280 - 287
  • [10] Tiling Stencil Computations to Maximize Parallelism
    Bandishti, Vinayaka
    Pananilath, Irshad
    Bondhugula, Uday
    [J]. 2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,