Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

被引:2
|
作者
Boratto, Murilo [1 ]
Alonso, Pedro [2 ]
Gimenez, Domingo [3 ]
Lastovetsky, Alexey [4 ]
机构
[1] Univ Estado Bahia, Nucleo Arquitetura Comp & Sistemas Operacionais, Salvador, BA, Brazil
[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, Valencia, Spain
[3] Univ Murcia, Dept Sistemas Informat, Murcia, Spain
[4] Univ Coll Dublin, Sch Comp Sci, Heterogeneous Comp Lab, Dublin, Ireland
来源
JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期
关键词
Automatic tuning; Matrix polynomials; Performance; Multicore; Multi-GPU;
D O I
10.1007/s11227-016-1694-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational system so that efficient executions are obtained independently of the end-user experience. This paper aims to explore programming routines that can automatically be adapted to the computational system conditions thanks to these automatic tuning methodologies. In particular, we have worked on the evaluation of matrix polynomials on multicore and multi-GPU systems as a target application. This application is very useful for the computation of matrix functions like the sine or cosine but, at the same time, the application is very time consuming since the basic computational kernel, which is the matrix multiplication, is carried out many times. The use of all available resources within a node in an easy and efficient way is crucial for the end user.
引用
收藏
页码:227 / 239
页数:13
相关论文
共 50 条
  • [21] Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems
    Ernsting, Steffen
    Kuchen, Herbert
    APPLICATIONS, TOOLS AND TECHNIQUES ON THE ROAD TO EXASCALE COMPUTING, 2012, 22 : 509 - 518
  • [22] Suffix Array Construction on Multi-GPU Systems
    Bueren, Florian
    Juenger, Daniel
    Kobus, Robin
    Hundt, Christian
    Schmidt, Bertil
    HPDC'19: PROCEEDINGS OF THE 28TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, 2019, : 183 - 194
  • [23] Multi-GPU codes for spin systems simulations
    Bernaschi, M.
    Fatica, M.
    Parisi, G.
    Parisi, L.
    COMPUTER PHYSICS COMMUNICATIONS, 2012, 183 (07) : 1416 - 1421
  • [24] Accelerating MapReduce framework on multi-GPU systems
    Jiang, Hai
    Chen, Yi
    Qiao, Zhi
    Li, Kuan-Ching
    Ro, WonWoo
    Gaudiot, Jean-Luc
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (02): : 293 - 301
  • [25] Accelerating MapReduce framework on multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Kuan-Ching Li
    WonWoo Ro
    Jean-Luc Gaudiot
    Cluster Computing, 2014, 17 : 293 - 301
  • [26] Scalable Betweenness Centrality on Multi-GPU systems
    Bernaschi, Massimo
    Carbone, Giancarlo
    Vella, Flavio
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 29 - 36
  • [27] An Empirical Evaluation of Allgatherv on Multi-GPU Systems
    Rolinger, Thomas B.
    Simon, Tyler A.
    Krieger, Christopher D.
    2018 18TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2018, : 123 - 132
  • [28] NMF-mGPU: non-negative matrix factorization on multi-GPU systems
    Edgardo Mejía-Roa
    Daniel Tabas-Madrid
    Javier Setoain
    Carlos García
    Francisco Tirado
    Alberto Pascual-Montano
    BMC Bioinformatics, 16
  • [29] Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems
    Young, Vinson
    Jaleel, Aamer
    Bolotin, Evgeny
    Ebrahimi, Eiman
    Nellans, David
    Villa, Oreste
    2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2018, : 339 - 351
  • [30] Automatic Data Allocation and Buffer Management for Multi-GPU Machines
    Ramashekar, Thejas
    Bondhugula, Uday
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 10 (04)