Lattice QCD on Intel® Xeon Phi™ Coprocessors

被引：0

作者：

Joo, Balint ^{[1
]}

Kalamkar, Dhiraj D. ^{[2
]}

Vaidyanathan, Karthikeyan ^{[2
]}

Smelyanskiy, Mikhail ^{[3
]}

Pamnany, Kiran ^{[2
]}

Lee, Victor W. ^{[3
]}

Dubey, Pradeep ^{[3
]}

Watson, William, III ^{[1
]}

机构：

[1] Thomas Jefferson Natl Accelerator Facil, Newport News, VA 23606 USA

[2] Intel Corp, Parallel Comp Lab, Bangalore, Karnataka, India

[3] Intel Corp, Parallel Comp Lab, Santa Clara, CA USA

来源：

SUPERCOMPUTING (ISC 2013) | 2013年 / 7905卷

关键词：

SOLVERS;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Lattice Quantum Chromodynamics (LQCD) is currently the only known model independent, non perturbative computational method for calculations in the theory of the strong interactions, and is of importance in studies of nuclear and high energy physics. LQCD codes use large fractions of supercomputing cycles worldwide and are often amongst the first to be ported to new high performance computing architectures. The recently released Intel Xeon Phi architecture from Intel Corporation features parallelism at the level of many x86-based cores, multiple threads per core, and vector processing units. In this contribution, we describe our experiences with optimizing a key LQCD kernel for the Xeon Phi architecture. On a single node, using single precision, our Dslash kernel sustains a performance of up to 320 GFLOPS, while our Conjugate Gradients solver sustains up to 237 GFLOPS. Furthermore we demonstrate a fully ' native' multi-node LQCD implementation running entirely on KNC nodes with minimum involvement of the host CPU. Our multi-node implementation of the solver has been strong scaled to 3.9 TFLOPS on 32 KNCs.

引用

页码：40 / 54

页数：15

共 50 条

[31] Understanding Data Analytics Workloads on Intel®Xeon Phi™
Xie, Biwei
Liu, Xu
Mckee, Sally A.
Zhan, Jianfeng
Jia, Zhen
Wang, Lei
Zhang, Lixin
[J]. PROCEEDINGS OF 2016 IEEE 18TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 14TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 2ND INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2016, : 206 - 215
[32] Intel Xeon Phi Coprocessor High Performance Programming
More, Andres
[J]. JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2013, 13 (02): : 105 - 106
[33] Offload Compiler Runtime for the Intel® Xeon Phi™ Coprocessor
Newburn, Chris J.
Deodhar, Rajiv
Dmitriev, Serguei
Murty, Ravi
Narayanaswamy, Ravi
Wiegert, John
Chinchilla, Francisco
McGuire, Russell
[J]. SUPERCOMPUTING (ISC 2013), 2013, 7905 : 239 - 254
[34] A survey on evaluating and optimizing performance of Intel Xeon Phi
Mittal, Sparsh
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (19):
[35] Intel® Xeon Phi™ coprocessor (codename Knights Corner)
Chrysos, George
[J]. 2012 IEEE HOT CHIPS 24 SYMPOSIUM (HCS), 2012,
[36] Implementing Central Force Optimization on the Intel Xeon Phi
Charest, Thomas
Green, Robert C.
[J]. 2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2020), 2020, : 502 - 511
[37] Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor
Lu, Mian
Zhang, Lei
Huynh Phung Huynh
Ong, Zhongliang
Liang, Yun
He, Bingsheng
Goh, Rick Siow Mong
Richard Huynh
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
[38] Effective Barrier Synchronization on Intel Xeon Phi Coprocessor
Rodchenko, Andrey
Nisbet, Andy
Pop, Antoniu
Lujan, Mikel
[J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 588 - 600
[39] HPC on the Intel Xeon Phi: Homomorphic Word Searching
Martins, Paulo
Sousa, Leonel
[J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 75 - 88
[40] Retargeting of the Open Community Runtime to Intel Xeon Phi
Dokulil, Jiri
Benkner, Siegfried
[J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 1453 - 1462

← 1 2 3 4 5 →