Understanding Stencil Code Performance On MultiCore Architectures

被引：23

作者：

Rahman, Shah M. Faizur ^{[1
]}

Yi, Qing ^{[1
]}

Qasem, Apan ^{[2
]}

机构：

[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA

[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA

来源：

PROCEEDINGS OF THE 2011 8TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF 2011) | 2011年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1145/2016604.2016641

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.

引用

页数：10

共 50 条

[21] Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures
Augonnet, Cedric
Thibault, Samuel
Namyst, Raymond
EURO-PAR 2009 PARALLEL PROCESSING WORKSHOPS, 2010, 6043 : 56 - 65
[22] Performance Analysis of NoC and WiNoC in Multicore System Architectures
Lit, Asrani
Suhaili, Shamsiah
Kipli, Kuryati
Rajaee, Nordiana
INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2025, 13 (01)
[23] High Performance Recursive Matrix Inversion for Multicore Architectures
Mahfoudhi, Ryma
Achour, Sami
Hamdi-Larbi, Olfa
Mahjoub, Zaher
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 675 - 682
[24] Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
Mallon, Damian A.
Taboada, Guillermo L.
Teijeiro, Carlos
Tourino, Juan
Fraguela, Basilio B.
Gomez, Andres
Doallo, Ramon
Carlos Mourino, J.
RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 174 - +
[25] Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code
Hadade, Ioan
di Mare, Luca
COMPUTER PHYSICS COMMUNICATIONS, 2016, 205 : 32 - 47
[26] Performance Optimisation of Stencil-Based Codes for Shared Memory Architectures
Abalenkovs, Maksims
2017 11TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2017, : 3231 - 3234
[27] OpenMP on multicore architectures
Terboven, Christian
Mey, Dieter an
Sarholz, Samuel
PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 54 - 64
[28] Roofline Guided Design and Analysis of a Multi-stencil CFD Solver for Multicore Performance
Mostafazadeh, Bahareh
Marti, Ferran
Liu, Feng
Chandramowlishwaran, Aparna
2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 753 - 762
[29] Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures
Arslan, Sanem
Topcuoglu, Haluk Rahmi
Kandemir, Mahmut Taylan
Tosun, Oguz
2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1025 - 1032
[30] A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures
Saussard, Romain
Bouzid, Boubker
Vasiliu, Marius
Reynaud, Roger
2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 77 - 84

← 1 2 3 4 5 →