Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems

被引:2
|
作者
Boratto, Murilo [1 ]
Alonso, Pedro [2 ]
Gimenez, Domingo [3 ]
Lastovetsky, Alexey [4 ]
机构
[1] Univ Estado Bahia, Nucleo Arquitetura Comp & Sistemas Operacionais, Salvador, BA, Brazil
[2] Univ Politecn Valencia, Dept Sistemas Informat & Comp, Valencia, Spain
[3] Univ Murcia, Dept Sistemas Informat, Murcia, Spain
[4] Univ Coll Dublin, Sch Comp Sci, Heterogeneous Comp Lab, Dublin, Ireland
来源
JOURNAL OF SUPERCOMPUTING | 2017年 / 73卷 / 01期
关键词
Automatic tuning; Matrix polynomials; Performance; Multicore; Multi-GPU;
D O I
10.1007/s11227-016-1694-y
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic tuning methodologies have been used in the design of routines in recent years. The goal of these methodologies is to develop routines which automatically adapt to the conditions of the underlying computational system so that efficient executions are obtained independently of the end-user experience. This paper aims to explore programming routines that can automatically be adapted to the computational system conditions thanks to these automatic tuning methodologies. In particular, we have worked on the evaluation of matrix polynomials on multicore and multi-GPU systems as a target application. This application is very useful for the computation of matrix functions like the sine or cosine but, at the same time, the application is very time consuming since the basic computational kernel, which is the matrix multiplication, is carried out many times. The use of all available resources within a node in an easy and efficient way is crucial for the end user.
引用
收藏
页码:227 / 239
页数:13
相关论文
共 50 条
  • [41] Autonomous Execution for Multi-GPU Systems: Compiler Support
    Koç University, Istanbul, Turkey
    不详
    CA, United States
    Proc. SC -W: Workshops Int. Conf. High Perform. Comput., Netw., Storage Anal., (1129-1140):
  • [42] Efficient breadth first search on multi-GPU systems
    Mastrostefano, Enrico
    Bernaschi, Massimo
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (09) : 1292 - 1305
  • [43] Dynamic load balancing on heterogeneous multi-GPU systems
    Acosta, Alejandro
    Blanco, Vicente
    Almeida, Francisco
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (08) : 2591 - 2602
  • [44] Tensor Movement Orchestration in Multi-GPU Training Systems
    Lin, Shao-Fu
    Chen, Yi-Jung
    Cheng, Hsiang-Yun
    Yang, Chia-Lin
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 1140 - 1152
  • [45] Gossip: Efficient Communication Primitives for Multi-GPU Systems
    Kobus, Robin
    Juenger, Daniel
    Hundt, Christian
    Schmidt, Bertil
    PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [46] MGPUSim: Enabling Multi-GPU Performance Modeling and Optimization
    Sun, Yifan
    Baruah, Trinayan
    Mojumder, Saiful A.
    Dong, Shi
    Gong, Xiang
    Treadway, Shane
    Bao, Yuhui
    Hance, Spencer
    McCardwell, Carter
    Zhao, Vincent
    Barclay, Harrison
    Ziabari, Amir Kavyan
    Chen, Zhongliang
    Ubal, Rafael
    Abelian, Jose L.
    Kim, John
    Joshi, Ajay
    Kaeli, David
    PROCEEDINGS OF THE 2019 46TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '19), 2019, : 197 - 209
  • [47] Solving Multiple Tridiagonal Systems on a Multi-GPU Platform
    Dieguez, Adrian P.
    Amor, Margarita
    Doallo, Ramon
    2018 26TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2018), 2018, : 759 - 763
  • [48] Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems
    Gao, Jianhua
    Ji, Weixing
    Wang, Yizhuo
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (04)
  • [49] Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes
    Cabezas, Javier
    Vilanova, Lluis
    Gelado, Isaac
    Jablin, Thomas B.
    Navarro, Nacho
    Hwu, Wen-mei W.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 3 - 13
  • [50] WORKLOAD-AWARE AUTOMATIC PARALLELIZATION FOR MULTI-GPU DNN TRAINING
    Shin, Sungho
    Jo, Youngmin
    Choi, Jungwook
    Venkataramani, Swagath
    Srinivasan, Vijayalakshmi
    Sung, Wonyong
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 1453 - 1457