Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark

被引：54

作者：

Coleman C. ^{[1
]}

Kang D. ^{[1
]}

Narayanan D. ^{[1
]}

Nardi L. ^{[1
]}

Zhao T. ^{[1
]}

Zhang J. ^{[1
]}

Bailis P. ^{[1
]}

Olukotun K. ^{[1
]}

Ré C. ^{[1
]}

Zaharia M. ^{[1
]}

机构：

[1] Stanford DAWN

来源：

Operating Systems Review (ACM) | 2019年 / 53卷 / 01期

基金：

美国国家科学基金会;

关键词：

Competition - Benchmarking - Deep learning - Economic and social effects;

D O I：

10.1145/3352020.3352024

中图分类号：

学科分类号：

摘要：

Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBENCH, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset-a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBENCH, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBENCH entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPERF v0.5 benchmark. © Copyright held by the owner/author(s). Publication rights licensed to ACM.

引用

页码：14 / 25

页数：11

共 50 条

[11] Benchmark and Survey of Automated Machine Learning Frameworks
Zoeller, Marc-Andre
Huber, Marco F.
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 70 : 409 - 472
[12] A protein classification benchmark collection for machine learning
Sonego, Paolo
Pacurar, Mircea
Dhir, Somdutta
Kertesz-Farkas, Attila
Kocsor, Andras
Gaspari, Zoltan
Leunissen, Jack A. M.
Pongor, Sandor
[J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D232 - D236
[13] A machine-learning benchmark for facies classification
Alaudah, Yazeed
Michalowicz, Patrycja
Alfarraj, Motaz
Alregib, Ghassan
[J]. INTERPRETATION-A JOURNAL OF SUBSURFACE CHARACTERIZATION, 2019, 7 (03): : SE175 - SE187
[14] PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
Takamoto, Makoto
Praditia, Timothy
Leiteritz, Raphael
MacKinlay, Dan
Alesiani, Francesco
Pflueger, Dirk
Niepert, Mathias
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[15] Benchmark AFLOW Data Sets for Machine Learning
Conrad L. Clement
Steven K. Kauwe
Taylor D. Sparks
[J]. Integrating Materials and Manufacturing Innovation, 2020, 9 : 153 - 156
[16] Benchmark and Survey of Automated Machine Learning Frameworks
Zöller M.-A.
Huber M.F.
[J]. Journal of Artificial Intelligence Research, 2021, 70 : 409 - 472
[17] Machine Learning Application Benchmark Invited Paper
Koch, Andreas
Petry, Michael
Ghiglione, Max
Raoofy, Amir
Dax, Gabriel
Furano, Gianluca
Werner, Martin
Trinitis, Carsten
Langer, Martin
[J]. PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 229 - 235
[18] Benchmark AFLOW Data Sets for Machine Learning
Clement, Conrad L.
Kauwe, Steven K.
Sparks, Taylor D.
[J]. INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2020, 9 (02) : 153 - 156
[19] Machine Learning Techniques for Software Maintainability Prediction: Accuracy Analysis
Sara Elmidaoui
Laila Cheikhi
Ali Idri
Alain Abran
[J]. Journal of Computer Science and Technology, 2020, 35 : 1147 - 1174
[20] Enhancing GPS Accuracy with Machine Learning: A Comparative Analysis of Algorithms
Zontul, Metin
Ersan, Ziya Gokalp
Yelmen, Ilkay
Cevik, Taner
Anka, Ferzat
Gesoglu, Kevser
[J]. TRAITEMENT DU SIGNAL, 2024, 41 (03) : 1441 - 1450

← 1 2 3 4 5 →