Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi

被引：7

作者：

Dongarra, Jack ^{[1
]}

Gates, Mark ^{[1
]}

Haidar, Azzam ^{[1
]}

Jia, Yulu ^{[1
]}

Kabir, Khairul ^{[1
]}

Luszczek, Piotr ^{[1
]}

Tomov, Stanimire ^{[1
]}

机构：

[1] Univ Tennessee, Knoxville, TN 37996 USA

来源：

PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013), PT I | 2014年 / 8384卷

关键词：

Numerical linear algebra; Intel Xeon Phi processor; Many Integrated Cores; Hardware accelerators and coprocessors; Dynamic runtime scheduling using dataflow dependences; Communication and computation overlap;

D O I：

10.1007/978-3-642-55224-3_53

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents the design and implementation of several fundamental dense linear algebra (DLA) algorithms for multicore with Intel Xeon Phi Coprocessors. In particular, we consider algorithms for solving linear systems. Further, we give an overview of the MAGMA MIC library, an open source, high performance library that incorporates the developments presented, and in general provides to heterogeneous architectures of multicore with coprocessors the DLA functionality of the popular LAPACK library. The LAPACK-compliance simplifies the use of the MAGMA MIC library in applications, while providing them with portably performant DLA. High performance is obtained through use of the high-performance BLAS, hardware-specific tuning, and a hybridization methodology where we split the algorithm into computational tasks of various granularities. Execution of those tasks is properly scheduled over the heterogeneous hardware components by minimizing data movements and mapping algorithmic requirements to the architectural strengths of the various heterogeneous hardware components. Our methodology and programming techniques are incorporated into the MAGMA MIC API, which abstracts the application developer from the specifics of the Xeon Phi architecture and is therefore applicable to algorithms beyond the scope of DLA.

引用

页码：571 / 581

页数：11

共 13 条

[1] HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi
Dongarra, Jack
Gates, Mark
Haidar, Azzam
Jia, Yulu
Kabir, Khairul
Luszczek, Piotr
Tomov, Stanimire
SCIENTIFIC PROGRAMMING, 2015, 2015
[2] OpenJDK Meets Xeon Phi: A Comprehensive Study of Java']Java HPC on Intel Many-core Architecture
Yu, Yang
Lei, Tianyang
Chen, Haibo
Zang, Binyu
2015 44TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING WORKSHOPS, 2015, : 156 - 165
[3] mAMBER:Accelerating explicit solvent molecular dynamic with Intel Xeon Phi Many-Integrated Core Coprocessors
Liu, Xin
Peng, Shaoliang
Yang, Canqun
Wu, Chengkun
Wang, Haiqiang
Cheng, Qian
Zhu, Weiliang
Wang, Jinan
2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 729 - 732
[4] Evaluating the Support of MTC Applications On Intel Xeon Phi Many-Core Accelerators
Nookala, Poornima
Dimitropoulos, Serapheim
Stough, Karl
Raicu, Ioan
2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 510 - 511
[5] Accelerating Time Series Subsequence Matching on the Intel Xeon Phi Many-core Coprocessor
Miniakhmetov, Ruslan
Movchan, Aleksander
Zymbler, Mikhail
2015 8TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2015, : 1399 - 1404
[6] ELT-scale adaptive optics real-time control with the Intel Xeon Phi Many Integrated Core Architecture
Jenkins, David R.
Basden, Alastair
Myers, Richard M.
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2018, 478 (03) : 3149 - 3158
[7] Efficient Query Processing on Many-core Architectures: A Case Study with Intel Xeon Phi Processor
Cheng, Xuntao
He, Bingsheng
Lu, Mian
Lau, Chiew Tong
Huynh Phung Huynh
Goh, Rick Siow Mong
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 2081 - 2084
[8] Training Large Scale Deep Neural Networks on the Intel Xeon Phi Many-core Coprocessor
Jin, Lei
Wang, Zhaokang
Gu, Rong
Yuan, Chunfeng
Huang, Yihua
PROCEEDINGS OF 2014 IEEE INTERNATIONAL PARALLEL & DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2014, : 1622 - 1630
[9] Optimizing Cache Locality for Irregular Data Accesses on Many-Core Intel Xeon Phi Accelerator Chip
Nhat-Phuong Tran
Choi, Dong Hoon
Lee, Myungho
2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 153 - 156
[10] Many-core needs fine-grained scheduling: A case study of query processing on Intel Xeon Phi processors
Cheng, Xuntao
He, Bingsheng
Lu, Mian
Lau, Chiew Tong
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 120 : 395 - 404

← 1 2 →