Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

被引:7
|
作者
Dongarra, Jack [1 ]
Gates, Mark [1 ]
Haidar, Azzam [1 ]
Jia, Yulu [1 ]
Kabir, Khairul [1 ]
Luszczek, Piotr [1 ]
Tomov, Stanimire [1 ]
机构
[1] Univ Tennessee, Knoxville, TN 37996 USA
关键词
Numerical linear algebra; Intel Xeon Phi processor; Many Integrated Cores; Hardware accelerators and coprocessors; Dynamic runtime scheduling using dataflow dependences; Communication and computation overlap;
D O I
10.1007/978-3-642-55224-3_53
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi Coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library that incorporates the developments presented, and in general provides to heterogeneous architectures of multicore with coprocessors the DLA functionality of the popular LAPACK library. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology where we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware components by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.
引用
收藏
页码:571 / 581
页数:11
相关论文
共 13 条
  • [1] HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
    Dongarra, Jack
    Gates, Mark
    Haidar, Azzam
    Jia, Yulu
    Kabir, Khairul
    Luszczek, Piotr
    Tomov, Stanimire
    SCIENTIFIC PROGRAMMING, 2015, 2015
  • [2] OpenJDK Meets Xeon Phi: A Comprehensive Study of Java']Java HPC on Intel Many-core Architecture
    Yu, Yang
    Lei, Tianyang
    Chen, Haibo
    Zang, Binyu
    2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 156 - 165
  • [3] mAMBER:Accelerating explicit solvent molecular dynamic with Intel Xeon Phi Many-Integrated Core Coprocessors
    Liu, Xin
    Peng, Shaoliang
    Yang, Canqun
    Wu, Chengkun
    Wang, Haiqiang
    Cheng, Qian
    Zhu, Weiliang
    Wang, Jinan
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 729 - 732
  • [4] Evaluating the Support of MTC Applications On Intel Xeon Phi Many-Core Accelerators
    Nookala, Poornima
    Dimitropoulos, Serapheim
    Stough, Karl
    Raicu, Ioan
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 510 - 511
  • [5] Accelerating Time Series Subsequence Matching on the Intel Xeon Phi Many-core Coprocessor
    Miniakhmetov, Ruslan
    Movchan, Aleksander
    Zymbler, Mikhail
    2015 8TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2015, : 1399 - 1404
  • [6] ELT-scale adaptive optics real-time control with the Intel Xeon Phi Many Integrated Core Architecture
    Jenkins, David R.
    Basden, Alastair
    Myers, Richard M.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2018, 478 (03) : 3149 - 3158
  • [7] Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor
    Cheng, Xuntao
    He, Bingsheng
    Lu, Mian
    Lau, Chiew Tong
    Huynh Phung Huynh
    Goh, Rick Siow Mong
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2081 - 2084
  • [8] Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-core Coprocessor
    Jin, Lei
    Wang, Zhaokang
    Gu, Rong
    Yuan, Chunfeng
    Huang, Yihua
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1622 - 1630
  • [9] Optimizing Cache Locality for Irregular Data Accesses on Many-Core Intel Xeon Phi Accelerator Chip
    Nhat-Phuong Tran
    Choi, Dong Hoon
    Lee, Myungho
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 153 - 156
  • [10] Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors
    Cheng, Xuntao
    He, Bingsheng
    Lu, Mian
    Lau, Chiew Tong
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 120 : 395 - 404