Understanding Stencil Code Performance On MultiCore Architectures

被引:23
|
作者
Rahman, Shah M. Faizur [1 ]
Yi, Qing [1 ]
Qasem, Apan [2 ]
机构
[1] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
[2] Texas State Univ, Dept Comp Sci, San Marcos, TX USA
基金
美国国家科学基金会;
关键词
D O I
10.1145/2016604.2016641
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computations are the foundation of many large applications in scientific computing. Previous research has shown that several optimization mechanisms, including rectangular blocking and time skewing combined with wavefront- and pipeline-based parallelization, can be used to significantly improve the performance of stencil kernels on multi-core architectures. However, the overall performance impact of these optimizations are difficult to predict due to the inter-play of load imbalance, synchronization overhead, and cache locality. This paper presents a detailed performance study of these optimizations by applying them with a wide variety of different configurations, using hardware counters to monitor the efficiency of architectural components, and then developing a set of formulas via regression analysis to model their overall performance impact in terms of the affected hardware counter numbers. We have applied our methodology to three stencil computation kernels, a 7-point jacobi, a 27-point jacobi, and a 7-point Gauss-Seidel computation. Our experimental results show that a precise formula can be developed for each kernel to accurately model the overall performance impact of varying optimizations and thereby effectively guide the performance analysis and tuning of these kernels.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Automatic Calibration of Performance Models on Heterogeneous Multicore Architectures
    Augonnet, Cedric
    Thibault, Samuel
    Namyst, Raymond
    EURO-PAR 2009 PARALLEL PROCESSING WORKSHOPS, 2010, 6043 : 56 - 65
  • [22] Performance Analysis of NoC and WiNoC in Multicore System Architectures
    Lit, Asrani
    Suhaili, Shamsiah
    Kipli, Kuryati
    Rajaee, Nordiana
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2025, 13 (01)
  • [23] High Performance Recursive Matrix Inversion for Multicore Architectures
    Mahfoudhi, Ryma
    Achour, Sami
    Hamdi-Larbi, Olfa
    Mahjoub, Zaher
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 675 - 682
  • [24] Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures
    Mallon, Damian A.
    Taboada, Guillermo L.
    Teijeiro, Carlos
    Tourino, Juan
    Fraguela, Basilio B.
    Gomez, Andres
    Doallo, Ramon
    Carlos Mourino, J.
    RECENT ADVANCES IN PARALLEL VIRTUAL MACHINE AND MESSAGE PASSING INTERFACE, PROCEEDINGS, 2009, 5759 : 174 - +
  • [25] Modern multicore and manycore architectures: Modelling, optimisation and benchmarking a multiblock CFD code
    Hadade, Ioan
    di Mare, Luca
    COMPUTER PHYSICS COMMUNICATIONS, 2016, 205 : 32 - 47
  • [26] Performance Optimisation of Stencil-Based Codes for Shared Memory Architectures
    Abalenkovs, Maksims
    2017 11TH EUROPEAN CONFERENCE ON ANTENNAS AND PROPAGATION (EUCAP), 2017, : 3231 - 3234
  • [27] OpenMP on multicore architectures
    Terboven, Christian
    Mey, Dieter an
    Sarholz, Samuel
    PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 54 - 64
  • [28] Roofline Guided Design and Analysis of a Multi-stencil CFD Solver for Multicore Performance
    Mostafazadeh, Bahareh
    Marti, Ferran
    Liu, Feng
    Chandramowlishwaran, Aparna
    2018 32ND IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2018, : 753 - 762
  • [29] Performance and Energy Efficient Asymmetrically Reliable Caches for Multicore Architectures
    Arslan, Sanem
    Topcuoglu, Haluk Rahmi
    Kandemir, Mahmut Taylan
    Tosun, Oguz
    2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1025 - 1032
  • [30] A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures
    Saussard, Romain
    Bouzid, Boubker
    Vasiliu, Marius
    Reynaud, Roger
    2016 IEEE 10TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANY-CORE SYSTEMS-ON-CHIP (MCSOC), 2016, : 77 - 84