Understanding Stencil Code Performance On MultiCore Architectures

被引：23

作者：

Rahman, Shah M. Faizur ^{[1
]}

Yi, Qing ^{[1
]}

Qasem, Apan ^{[2
]}

机构：

[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA

[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA

来源：

PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011) | 2011年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1145/2016604.2016641

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.

引用

页数：10

共 50 条

[31] MODELING THE PERFORMANCE OF GEOMETRIC MULTIGRID STENCILS ON MULTICORE COMPUTER ARCHITECTURES
Ghysels, Pieter
Vanroose, Wim
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (02): : C194 - C216
[32] Performance Characterization and Evaluation of HPC Algorithms on Dissimilar Multicore Architectures
Krishnan, S. P. T.
Veeravalli, Bharadwaj
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 1288 - 1295
[33] Comparing performance of C compilers optimizations on different multicore architectures
Machado, Roger S.
Almeida, Ricardo B.
Jardim, Andre D.
Pernas, Ana M.
Yamin, Adenauer C.
Cavalheiro, Gerson Geraldo H.
2017 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW), 2017, : 25 - 30
[34] On the Performance and Energy Efficiency of the PGAS Programming Model on Multicore Architectures
Lagraviere, Jeremie
Langguth, Johannes
Sourouri, Mohammed
Ha, Phuong H.
Cai, Xing
2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 800 - 807
[35] Understanding Fundamental Design Choices in Single-ISA Heterogeneous Multicore Architectures
Van Craeynest, Kenzo
Eeckhout, Lieven
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
[36] Code Refinement of Stencil Codes
Koster, Marcel
Leissa, Roland
Hack, Sebastian
Membarth, Richard
Slusallek, Philipp
PARALLEL PROCESSING LETTERS, 2014, 24 (03)
[37] Understanding the Performance of Stencil Computations on Intel's Xeon Phi
Peraza, Joshua
Tiwari, Ananta
Laurenzano, Michael
Carrington, Laura
Ward, William A.
Campbell, Roy
2013 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2013,
[38] Comments on CFD code performance on scalable architectures
Behr, M
Pressel, DM
Sturek, WB
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2000, 190 (3-4) : 263 - 277
[39] High-Order Stencil Computations on Multicore Clusters
Peng, Liu
Seymour, Richard
Nomura, Ken-ichi
Kalia, Rajiv K.
Nakano, Aiichiro
Vashishta, Priya
Loddoch, Alexander
Netzband, Michael
Volz, William R.
Wong, Chap C.
2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 315 - +
[40] High performance low cost multicore NoC architectures for embedded systems
Tutsch, Dietmar
Hommel, Guenter
EMBEDDED SYSTEMS - MODELING, TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2006, : 53 - +

← 1 2 3 4 5 →