Understanding Stencil Code Performance On MultiCore Architectures

被引:23
|
作者
Rahman, Shah M. Faizur [1 ]
Yi, Qing [1 ]
Qasem, Apan [2 ]
机构
[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/2016604.2016641
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] MODELING THE PERFORMANCE OF GEOMETRIC MULTIGRID STENCILS ON MULTICORE COMPUTER ARCHITECTURES
    Ghysels, Pieter
    Vanroose, Wim
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (02): : C194 - C216
  • [32] Performance Characterization and Evaluation of HPC Algorithms on Dissimilar Multicore Architectures
    Krishnan, S. P. T.
    Veeravalli, Bharadwaj
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1288 - 1295
  • [33] Comparing performance of C compilers optimizations on different multicore architectures
    Machado, Roger S.
    Almeida, Ricardo B.
    Jardim, Andre D.
    Pernas, Ana M.
    Yamin, Adenauer C.
    Cavalheiro, Gerson Geraldo H.
    2017 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2017, : 25 - 30
  • [34] On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
    Lagraviere, Jeremie
    Langguth, Johannes
    Sourouri, Mohammed
    Ha, Phuong H.
    Cai, Xing
    2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 800 - 807
  • [35] Understanding Fundamental Design Choices in Single-ISA Heterogeneous Multicore Architectures
    Van Craeynest, Kenzo
    Eeckhout, Lieven
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [36] Code Refinement of Stencil Codes
    Koster, Marcel
    Leissa, Roland
    Hack, Sebastian
    Membarth, Richard
    Slusallek, Philipp
    PARALLEL PROCESSING LETTERS, 2014, 24 (03)
  • [37] Understanding the Performance of Stencil Computations on Intel's Xeon Phi
    Peraza, Joshua
    Tiwari, Ananta
    Laurenzano, Michael
    Carrington, Laura
    Ward, William A.
    Campbell, Roy
    2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
  • [38] Comments on CFD code performance on scalable architectures
    Behr, M
    Pressel, DM
    Sturek, WB
    COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2000, 190 (3-4) : 263 - 277
  • [39] High-Order Stencil Computations on Multicore Clusters
    Peng, Liu
    Seymour, Richard
    Nomura, Ken-ichi
    Kalia, Rajiv K.
    Nakano, Aiichiro
    Vashishta, Priya
    Loddoch, Alexander
    Netzband, Michael
    Volz, William R.
    Wong, Chap C.
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 315 - +
  • [40] High performance low cost multicore NoC architectures for embedded systems
    Tutsch, Dietmar
    Hommel, Guenter
    EMBEDDED SYSTEMS - MODELING, TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2006, : 53 - +