Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

被引：2

作者：

Boratto, Murilo ^{[1
]}

Alonso, Pedro ^{[2
]}

Gimenez, Domingo ^{[3
]}

Lastovetsky, Alexey ^{[4
]}

机构：

[1] Univ Estado Bahia, Nucleo Arquitetura Comp & Sistemas Operacionais, Salvador, BA, Brazil

[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, Valencia, Spain

[3] Univ Murcia, Dept Sistemas Informat, Murcia, Spain

[4] Univ Coll Dublin, Sch Comp Sci, Heterogeneous Comp Lab, Dublin, Ireland

来源：

JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期

关键词：

Automatic tuning; Matrix polynomials; Performance; Multicore; Multi-GPU;

D O I：

10.1007/s11227-016-1694-y

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational system so that efficient executions are obtained independently of the end-user experience. This paper aims to explore programming routines that can automatically be adapted to the computational system conditions thanks to these automatic tuning methodologies. In particular, we have worked on the evaluation of matrix polynomials on multicore and multi-GPU systems as a target application. This application is very useful for the computation of matrix functions like the sine or cosine but, at the same time, the application is very time consuming since the basic computational kernel, which is the matrix multiplication, is carried out many times. The use of all available resources within a node in an easy and efficient way is crucial for the end user.

引用

页码：227 / 239

页数：13

共 50 条

[21] Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems
Ernsting, Steffen
Kuchen, Herbert
APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 509 - 518
[22] Suffix Array Construction on Multi-GPU Systems
Bueren, Florian
Juenger, Daniel
Kobus, Robin
Hundt, Christian
Schmidt, Bertil
HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 183 - 194
[23] Multi-GPU codes for spin systems simulations
Bernaschi, M.
Fatica, M.
Parisi, G.
Parisi, L.
COMPUTER PHYSICS COMMUNICATIONS, 2012, 183 (07) : 1416 - 1421
[24] Accelerating MapReduce framework on multi-GPU systems
Jiang, Hai
Chen, Yi
Qiao, Zhi
Li, Kuan-Ching
Ro, WonWoo
Gaudiot, Jean-Luc
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (02): : 293 - 301
[25] Accelerating MapReduce framework on multi-GPU systems
Hai Jiang
Yi Chen
Zhi Qiao
Kuan-Ching Li
WonWoo Ro
Jean-Luc Gaudiot
Cluster Computing, 2014, 17 : 293 - 301
[26] Scalable Betweenness Centrality on Multi-GPU systems
Bernaschi, Massimo
Carbone, Giancarlo
Vella, Flavio
PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 29 - 36
[27] An Empirical Evaluation of Allgatherv on Multi-GPU Systems
Rolinger, Thomas B.
Simon, Tyler A.
Krieger, Christopher D.
2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 123 - 132
[28] NMF-mGPU: non-negative matrix factorization on multi-GPU systems
Edgardo Mejía-Roa
Daniel Tabas-Madrid
Javier Setoain
Carlos García
Francisco Tirado
Alberto Pascual-Montano
BMC Bioinformatics, 16
[29] Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems
Young, Vinson
Jaleel, Aamer
Bolotin, Evgeny
Ebrahimi, Eiman
Nellans, David
Villa, Oreste
2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2018, : 339 - 351
[30] Automatic Data Allocation and Buffer Management for Multi-GPU Machines
Ramashekar, Thejas
Bondhugula, Uday
ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (04)

← 1 2 3 4 5 →