Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

被引：0

作者：

Byun, Chansup ^{[1
]}

Kepner, Jeremy ^{[1
]}

Arcand, William ^{[1
]}

Bestor, David ^{[1
]}

Bergeron, Bill ^{[1
]}

Gadepally, Vijay ^{[1
]}

Houle, Michael ^{[1
]}

Hubbell, Matthew ^{[1
]}

Jones, Michael ^{[1
]}

Klein, Anna ^{[1
]}

Michaleas, Peter ^{[1
]}

Milechin, Lauren ^{[1
]}

Mullen, Julie ^{[1
]}

Prout, Andrew ^{[1
]}

Rosa, Antonio ^{[1
]}

Samsi, Siddharth ^{[1
]}

Yee, Charles ^{[1
]}

Reuther, Albert ^{[1
]}

机构：

[1] MIT, Lincoln Lab, Supercomp Ctr, 244 Wood St, Lexington, MA 02173 USA

来源：

2017 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC) | 2017年

关键词：

Benchmark; MATLAB; Octave; DGEMM; throughput; performance; machine learning; Caffe; Haswell; Knights Landing;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and Octave. More recently, machine learning applications, such as the UC Berkeley Caffe deep learning framework, have become increasingly important to LLSC users. Thus, the performance of these applications on KNL systems is of high interest to LLSC users and the broader data analysis and machine learning communities. Our data analysis benchmarks of these application on the Intel KNL processor indicate that single-core double-precision generalized matrix multiply (DGEMM) performance on KNL systems has improved by similar to 3.5x compared to prior Intel Xeon technologies. Our data analysis applications also achieved similar to 60% of the theoretical peak performance. Also a performance comparison of a machine learning application, Caffe, between the two different Intel CPUs, Xeon E5 v3 and Xeon Phi 7210, demonstrated a 2.7x improvement on a KNL node.

引用

页数：6

共 50 条

[1] Benchmarking SW26010 Many-core Processor
Xu, Zhigeng
Lin, James
Matsuoka, Satoshi
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 743 - 752
[2] Emulating Asymmetric MPSoCs on the Intel SCC Many-core Processor
Bakker, Roy
van Tol, Michiel W.
Pimentel, Andy D.
2014 22ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING (PDP 2014), 2014, : 520 - 527
[3] Design and Analysis of a Many-Core Processor Architecture for Multimedia Applications
Lai, Jyu-Yuan
Chen, Po-Yu
Hsu, Ting-Shuo
Huang, Chih-Tsun
Liou, Jing-Jia
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[4] Parallel simulation of many-core processor and many-core clusters
Lü, Huiwei
Cheng, Yuan
Bai, Lu
Chen, Mingyu
Fan, Dongrui
Sun, Ninghui
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (05): : 1110 - 1117
[5] Design of A Scalable Many-Core Processor for Embedded Applications
Chien, Hsiao-Wei
Lai, Jyun-Long
Wu, Chao-Chieh
Huang, Chih-Tsun
Hsu, Ting-Shuo
Liou, Jing-Jia
2015 20TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2015, : 24 - 25
[6] A Many-core Parallelizing Processor
Porada, Katarzyna
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 875 - 877
[7] New system software for parallel programming models on the Intel SCC many-core processor
Clauss, Carsten
Lankes, Stefan
Reble, Pablo
Bemmerl, Thomas
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (09): : 2235 - 2259
[8] Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor
Rihani, Hamza
Moy, Matthieu
Maiza, Claire
Davis, Robert I.
Altmeyer, Sebastian
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON REAL-TIME NETWORKS AND SYSTEMS PROCEEDINGS (RTNS 2016), 2016, : 67 - 76
[9] Detection and Analysis of Congestion of Nodes in Many-Core Processor
Abraham, Nishin Jude C.
Radha, D.
FIRST INTERNATIONAL CONFERENCE ON SUSTAINABLE TECHNOLOGIES FOR COMPUTATIONAL INTELLIGENCE, 2020, 1045 : 755 - 768
[10] Federated Learning Platform on Embedded Many-core Processor with Flower
Hasumi, Masahiro
Azumi, Takuya
2024 IEEE 3RD REAL-TIME AND INTELLIGENT EDGE COMPUTING WORKSHOP, RAGE 2024, 2024, : 37 - 42

← 1 2 3 4 5 →