Understanding Stencil Code Performance On MultiCore Architectures

被引:23
|
作者
Rahman, Shah M. Faizur [1 ]
Yi, Qing [1 ]
Qasem, Apan [2 ]
机构
[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/2016604.2016641
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Understanding the thermal implications of multicore architectures
    Chaparro, Pedro
    Gonzalez, Jose
    Magklis, Grigorios
    Cai, Qiong
    Gonzalez, Antonio
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (08) : 1055 - 1065
  • [2] High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures
    Li, Pei
    Brunet, Elisabeth
    Namyst, Raymond
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1512 - 1518
  • [3] Characterizing Performance and Cache Impacts of Code Multi-Versioning on Multicore Architectures
    Zangerl, Peter
    Thoman, Peter
    Fahringer, Thomas
    [J]. 2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017), 2017, : 209 - 213
  • [4] A Predictive Performance Model for Stencil Codes on Multicore CPUs
    Schaefer, Andreas
    Fey, Dietmar
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012, 2013, 7851 : 451 - 466
  • [5] A predictive performance model for stencil codes on multicore CPUs
    Schäfer, Andreas
    Fey, Dietmar
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7851 LNCS : 451 - 466
  • [6] Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures
    Datta, Kaushik
    Murphy, Mark
    Volkov, Vasily
    Williams, Samuel
    Carter, Jonathan
    Oliker, Leonid
    Patterson, David
    Shalf, John
    Yelick, Katherine
    [J]. INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 510 - +
  • [7] Application scalability and performance on multicore architectures
    Simon, Tyler A.
    Cable, Sam B.
    Mahmoodi, Mahin
    [J]. PROCEEDINGS OF THE HPCMP USERS GROUP CONFERENCE 2007, 2007, : 378 - 381
  • [8] High Performance Stencil Code Generation with LIFT
    Hagedorn, Bastian
    Stoltzfus, Larisa
    Steuwer, Michel
    Gorlatch, Sergei
    Dubach, Christophe
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112
  • [9] High Performance Stencil Code Algorithms for GPGPUs
    Schaefer, Andreas
    Fey, Dietmar
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 2027 - 2036
  • [10] Optimization and Performance Modeling of Stencil Computations on ARM Architectures
    Zhang, Kaifang
    Su, Huayou
    Zhang, Peng
    Dou, Yong
    [J]. Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020, 2020, : 113 - 121