Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

被引：0

作者：

Byun, Chansup ^{[1
]}

Kepner, Jeremy ^{[1
]}

Arcand, William ^{[1
]}

Bestor, David ^{[1
]}

Bergeron, Bill ^{[1
]}

Gadepally, Vijay ^{[1
]}

Houle, Michael ^{[1
]}

Hubbell, Matthew ^{[1
]}

Jones, Michael ^{[1
]}

Klein, Anna ^{[1
]}

Michaleas, Peter ^{[1
]}

Milechin, Lauren ^{[1
]}

Mullen, Julie ^{[1
]}

Prout, Andrew ^{[1
]}

Rosa, Antonio ^{[1
]}

Samsi, Siddharth ^{[1
]}

Yee, Charles ^{[1
]}

Reuther, Albert ^{[1
]}

机构：

[1] MIT, Lincoln Lab, Supercomp Ctr, 244 Wood St, Lexington, MA 02173 USA

来源：

2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC) | 2017年

关键词：

Benchmark; MATLAB; Octave; DGEMM; throughput; performance; machine learning; Caffe; Haswell; Knights Landing;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and Octave. More recently, machine learning applications, such as the UC Berkeley Caffe deep learning framework, have become increasingly important to LLSC users. Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities. Our data analysis benchmarks of these application on the Intel KNL processor indicate that single-core double-precision generalized matrix multiply (DGEMM) performance on KNL systems has improved by similar to 3.5x compared to prior Intel Xeon technologies. Our data analysis applications also achieved similar to 60% of the theoretical peak performance. Also a performance comparison of a machine learning application, Caffe, between the two different Intel CPUs, Xeon E5 v3 and Xeon Phi 7210, demonstrated a 2.7x improvement on a KNL node.

引用

页数：6

共 50 条

[31] On synchronization and evaluation method of chipped many-core processor
Xu W.-Z.
Song F.-L.
Liu Z.-Y.
Fan D.-R.
Yu L.
Zhang S.
Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (10): : 1777 - 1787
[32] Discovery of Time Series Motifs on Intel Many-Core Systems
Zymbler, M. L.
Kraeva, Ya. A.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2019, 40 (12) : 2124 - 2132
[33] Characterizing and optimizing Java']Java-based HPC applications on Intel many-core architecture
Yu, Yang
Lei, Tianyang
Chen, Haibo
Zang, Binyu
SCIENCE CHINA-INFORMATION SCIENCES, 2017, 60 (12)
[34] Acceleration of ensemble machine learning methods using many-core devices
Tamerus, A.
Washbrook, A.
Wyeth, D.
21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
[35] Empirical Analysis of the I/O Characteristics of a Highly Integrated Many-Core Processor
Lee, Cheongjun
Lee, Jaehwan
Koo, Donghun
Kim, Chungyong
Bang, Jiwoo
Byun, Eun-Kyu
Eom, Hyeonsang
2020 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING AND SELF-ORGANIZING SYSTEMS COMPANION (ACSOS-C 2020), 2020, : 1 - 6
[36] Towards optimized tensor code generation for deep learning on sunway many-core processor
Li, Mingzhen
Liu, Changxi
Liao, Jianjin
Zheng, Xuegui
Yang, Hailong
Sun, Rujun
Xu, Jun
Gan, Lin
Yang, Guangwen
Luan, Zhongzhi
Qian, Depei
FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (02)
[37] Towards optimized tensor code generation for deep learning on sunway many-core processor
Mingzhen Li
Changxi Liu
Jianjin Liao
Xuegui Zheng
Hailong Yang
Rujun Sun
Jun Xu
Lin Gan
Guangwen Yang
Zhongzhi Luan
Depei Qian
Frontiers of Computer Science, 2024, 18
[38] Response Time Analysis of Dataflow Applications on a Many-Core Processor with Shared-Memory and Network-on-Chip
Graillat, Amaury
Maiza, Claire
Moy, Matthieu
Raymond, Pascal
de Dinechin, Benoit Dupont
RTNS 2019: PROCEEDINGS OF THE 27TH INTERNATIONAL CONFERENCE ON REAL-TIME NETWORKS AND SYSTEMS (RTNS 2019), 2020, : 61 - 69
[39] Methodologies for the WCET Analysis of Parallel Applications on Many-core Architectures
Nelis, Vincent
Yomsi, Patrick Meumeu
Pinho, Luis Miguel
2015 EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2015, : 748 - 755
[40] Nanosatellite On-Board Computer including a Many-Core Processor
Pancher, Fabrice
Vargas, Vanessa
Ramos, Pablo
Bastos, Rodrigo Possamai
Saravia, David Cesar Ardiles
Velazco, Raoul
2021 IEEE 22ND LATIN AMERICAN TEST SYMPOSIUM (LATS2021), 2021,

← 1 2 3 4 5 →