Understanding Stencil Code Performance On MultiCore Architectures

被引：23

作者：

Rahman, Shah M. Faizur ^{[1
]}

Yi, Qing ^{[1
]}

Qasem, Apan ^{[2
]}

机构：

[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA

[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA

来源：

PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011) | 2011年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1145/2016604.2016641

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.

引用

页数：10

共 50 条

[1] Understanding the thermal implications of multicore architectures
Chaparro, Pedro
Gonzalez, Jose
Magklis, Grigorios
Cai, Qiong
Gonzalez, Antonio
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (08) : 1055 - 1065
[2] High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures
Li, Pei
Brunet, Elisabeth
Namyst, Raymond
2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1512 - 1518
[3] Characterizing Performance and Cache Impacts of Code Multi-Versioning on Multicore Architectures
Zangerl, Peter
Thoman, Peter
Fahringer, Thomas
2017 25TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2017), 2017, : 209 - 213
[4] A Predictive Performance Model for Stencil Codes on Multicore CPUs
Schaefer, Andreas
Fey, Dietmar
HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2012, 2013, 7851 : 451 - 466
[5] A predictive performance model for stencil codes on multicore CPUs
Schäfer, Andreas
Fey, Dietmar
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2013, 7851 LNCS : 451 - 466
[6] Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures
Datta, Kaushik
Murphy, Mark
Volkov, Vasily
Williams, Samuel
Carter, Jonathan
Oliker, Leonid
Patterson, David
Shalf, John
Yelick, Katherine
INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 510 - +
[7] Application scalability and performance on multicore architectures
Simon, Tyler A.
Cable, Sam B.
Mahmoodi, Mahin
PROCEEDINGS OF THE HPCMP USERS GROUP CONFERENCE 2007, 2007, : 378 - 381
[8] High Performance Stencil Code Generation with LIFT
Hagedorn, Bastian
Stoltzfus, Larisa
Steuwer, Michel
Gorlatch, Sergei
Dubach, Christophe
PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO'18), 2018, : 100 - 112
[9] High Performance Stencil Code Algorithms for GPGPUs
Schaefer, Andreas
Fey, Dietmar
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 2027 - 2036
[10] Optimization and Performance Modeling of Stencil Computations on ARM Architectures
Zhang, Kaifang
Su, Huayou
Zhang, Peng
Dou, Yong
Proceedings - 2020 IEEE 22nd International Conference on High Performance Computing and Communications, IEEE 18th International Conference on Smart City and IEEE 6th International Conference on Data Science and Systems, HPCC-SmartCity-DSS 2020, 2020, : 113 - 121

← 1 2 3 4 5 →