Loop tiling for optimization of locality and parallelism

被引:0
|
作者
Liu, Song [1 ]
Wu, Weiguo [1 ]
Zhao, Bo [1 ]
Jiang, Qing [1 ]
机构
[1] School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an,710049, China
关键词
Economic and social effects - Memory architecture - Optimal systems - Codes (symbols) - Ion beams - Iterative methods;
D O I
10.7544/issn1000-1239.2015.20131387
中图分类号
学科分类号
摘要
Loop tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality in modern computer architecture. It is mainly divided into two categories: fixed and parameterized. These two types of tiling technologies are systematically summarized and their advantages and disadvantages are analyzed comprehensively. Since the tile size would significantly affect the performance of the tiled code, various methods of optimal tile size selection are described. Besides, various kinds of technologies applied to multi-level tiling, parallelism exploration and imperfectly nested loops are surveyed in this paper. Based on the detailed analysis of the current researches on loop tiling technologies, several conclusions are drawn as follows: 1) How to balance the trade-off between computation complexity and generation efficiency of tiled code has not been completely solved, and how to use loop boundaries to efficiently bound the iteration spaces for data locality enhancement also needs further study. 2) Optimal tile size selection is still a difficult and open question, and it would be significant to understand the influence of different level tile size in hierarchical memory system on performance. 3) From the perspective of application, how to automatically generate effective tiled code for arbitrarily nested loops needs further research. On the other hand, how to take full advantage of shared hierarchical memory and multi-core architectures to achieve high degree of parallelism for tiled code is another interesting direction. ©, 2015, Science Press. All right reserved.
引用
收藏
页码:1160 / 1176
相关论文
共 50 条
  • [21] Reuse-driven tiling for improving data locality
    Xue, JL
    Huang, CH
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 1998, 26 (06) : 671 - 696
  • [22] New tiling techniques to improve cache temporal locality
    Song, YH
    Li, ZY
    ACM SIGPLAN NOTICES, 1999, 34 (05) : 215 - 228
  • [23] Code Scheduling for Optimizing Parallelism and Data Locality
    Yemliha, Taylan
    Kandemir, Mahmut
    Ozturk, Ozcan
    Kultursay, Emre
    Muralidhara, Sai Prashanth
    EURO-PAR 2010 PARALLEL PROCESSING, PT I, 2010, 6271 : 204 - +
  • [24] Reuse-Driven Tiling for Improving Data Locality
    Jingling Xue
    Chua-Huang Huang
    International Journal of Parallel Programming, 1998, 26 : 671 - 696
  • [25] Reuse Distance Analysis for Locality Optimization in Loop-Dominated Applications
    Lezos, Christakis
    Dimitroulakos, Grigoris
    Masselos, Konstantinos
    2015 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2015, : 1237 - 1240
  • [26] A new genetic algorithm for loop tiling
    Parsa, Saeed
    Lotfi, Shahriar
    JOURNAL OF SUPERCOMPUTING, 2006, 37 (03): : 249 - 269
  • [27] A New Genetic Algorithm for Loop Tiling
    Saeed Parsa
    Shahriar Lotfi
    The Journal of Supercomputing, 2006, 37 : 249 - 269
  • [28] Defensive Loop Tiling for Shared Cache
    Bao, Bin
    Ding, Chen
    PROCEEDINGS OF THE 2013 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2013, : 324 - 334
  • [29] Loop bounds computation for multilevel tiling
    Jimenez, M
    Llaberia, JM
    Fernandez, A
    PROCEEDINGS OF THE SIXTH EUROMICRO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING - PDP '98, 1998, : 445 - 452
  • [30] LaPerm: Locality Aware Scheduler for Dynamic Parallelism on GPUs
    Wang, Jin
    Rubin, Norm
    Sidelnik, Albert
    Yalamanchili, Sudhakar
    2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 583 - 595