Automatic Partitioning of Stencil Computations on Heterogeneous Systems

被引：4

作者：

Pereira, Alyson D. ^{[1
]}

Rocha, Rodrigo C. O. ^{[4
]}

Ramos, Luiz ^{[3
]}

Castro, Marcio ^{[1
]}

Goes, Luis F. W. ^{[2
]}

机构：

[1] Univ Fed Santa Catarina, Florianopolis, SC, Brazil

[2] Pontificia Univ Catolica Minas Gerais, Belo Horizonte, MG, Brazil

[3] Univ Estadual Campinas, Campinas, SP, Brazil

[4] Univ Edinburgh, Edinburgh, Midlothian, Scotland

来源：

2017 INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW) | 2017年

关键词：

Stencil; Work Partitioning; Decision Tree Learning;

D O I：

10.1109/SBAC-PADW.2017.16

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The stencil pattern is important in many scientific and engineering domains, spurring great interest from researchers and industry. In recent years, various optimizations have been proposed for parallel stencil applications running on GPUs. However, most of the runtime systems that execute those applications often fail to fully utilize the parallelism of modern heterogeneous systems. In this paper, we propose a mechanism based on machine learning that automatically partitions stencil computations across CPU and GPU. We implemented it into the PSkel framework and found that the mechanism can boost the performance of stencil applications on average by 17.9x compared to their sequential CPU-only counterparts, by 1.34x compared to a GPU-only version, and by 1.48x compared to a parallel CPU-only version.

引用

页码：43 / 48

页数：6

共 50 条

[31] Autotuning divide-and-conquer stencil computations
Natarajan, Ekanathan Palamadai
Dehnavi, Maryam Mehri
Leiserson, Charles
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (17):
[32] Parameterized Diamond Tiling for Parallelizing Stencil Computations
Wijesinghe, T.
Senevirathne, K.
Siriwardhana, C.
Visitha, W.
Jayasena, S.
Rusira, T.
Hall, M.
[J]. 2017 3RD INTERNATIONAL MORATUWA ENGINEERING RESEARCH CONFERENCE (MERCON), 2017, : 99 - 104
[33] Automatic coarse-grain partitioning and automatic code generation for heterogeneous architectures
Raulet, M
Babel, M
Déforges, O
Nezan, JF
Sorel, Y
[J]. SIPS 2003: IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS: DESIGN AND IMPLEMENTATION, 2003, : 316 - 321
[34] The memory behavior of cache oblivious stencil computations
Frigo, Matteo
Strumpen, Volker
[J]. JOURNAL OF SUPERCOMPUTING, 2007, 39 (02): : 93 - 112
[35] Speeding Up Stencil Computations with Kernel Convolution
Januario, Guilherme C.
Rosenburg, Bryan S.
Park, Yoonho
Perrone, Michael
Moreira, Jose
Carvalho, Tereza C. M. B.
[J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 76 - 83
[36] Optimal image partitioning in heterogeneous computing systems
Zeng, ZY
Lu, XD
[J]. ELECTRONICS LETTERS, 2002, 38 (18) : 1023 - 1023
[37] A Data Partitioning Model for Highly Heterogeneous Systems
Tabik, S.
Ortega, G.
Garzon, E. M.
Suarez, D.
[J]. EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 468 - 479
[38] Strategy for data-flow synchronizations in stencil parallel computations on multi-/manycore systems
Szustak, Lukasz
[J]. JOURNAL OF SUPERCOMPUTING, 2018, 74 (04): : 1534 - 1546
[39] Multi-Personality Partitioning for Heterogeneous Systems
Gregerson, Anthony
Chadha, Aman
Morrow, Katherine
[J]. PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), 2013, : 314 - 317
[40] Strategy for data-flow synchronizations in stencil parallel computations on multi-/manycore systems
Lukasz Szustak
[J]. The Journal of Supercomputing, 2018, 74 : 1534 - 1546

← 1 2 3 4 5 →