Analysis of dawnbench, a time-to-accuracy machine learning performance benchmark

被引:54
|
作者
Coleman C. [1 ]
Kang D. [1 ]
Narayanan D. [1 ]
Nardi L. [1 ]
Zhao T. [1 ]
Zhang J. [1 ]
Bailis P. [1 ]
Olukotun K. [1 ]
Ré C. [1 ]
Zaharia M. [1 ]
机构
[1] Stanford DAWN
来源
Operating Systems Review (ACM) | 2019年 / 53卷 / 01期
基金
美国国家科学基金会;
关键词
Competition - Benchmarking - Deep learning - Economic and social effects;
D O I
10.1145/3352020.3352024
中图分类号
学科分类号
摘要
Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model's accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBENCH, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset-a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBENCH, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBENCH entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPERF v0.5 benchmark. © Copyright held by the owner/author(s). Publication rights licensed to ACM.
引用
收藏
页码:14 / 25
页数:11
相关论文
共 50 条
  • [31] Analysis of three intrusion detection system benchmark datasets using machine learning algorithms
    Kayacik, HG
    Zincir-Heywood, N
    [J]. INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2005, 3495 : 362 - 367
  • [32] Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the HeiChole benchmark
    Wagner, Martin
    Mueller-Stich, Beat-Peter
    Kisilenko, Anna
    Tran, Duc
    Heger, Patrick
    Muendermann, Lars
    Lubotsky, David M.
    Mueller, Benjamin
    Davitashvili, Tornike
    Capek, Manuela
    Reinke, Annika
    Reid, Carissa
    Yu, Tong
    Vardazaryan, Armine
    Nwoye, Chinedu Innocent
    Padoy, Nicolas
    Liu, Xinyang
    Lee, Eung-Joo
    Disch, Constantin
    Meine, Hans
    Xia, Tong
    Jia, Fucang
    Kondo, Satoshi
    Reiter, Wolfgang
    Jin, Yueming
    Long, Yonghao
    Jiang, Meirui
    Dou, Qi
    Heng, Pheng Ann
    Twick, Isabell
    Kirtac, Kadir
    Hosgor, Enes
    Bolmgren, Jon Lindstro
    Stenzel, Michael
    von Siemens, Bjorn
    Zhao, Long
    Ge, Zhenxiao
    Sun, Haiming
    Xie, Di
    Guo, Mengqi
    Liu, Daochang
    Kenngott, Hannes G.
    Nickel, Felix
    von Frankenberg, Moritz
    Mathis-Ullrich, Franziska
    Kopp-Schneider, Annette
    Maier-Hein, Lena
    Speidel, Stefanie
    Bodenstedt, Sebastian
    [J]. MEDICAL IMAGE ANALYSIS, 2023, 86
  • [33] Open Graph Benchmark: Datasets for Machine Learning on Graphs
    Hu, Weihua
    Fey, Matthias
    Zitnik, Marinka
    Dong, Yuxiao
    Ren, Hongyu
    Liu, Bowen
    Catasta, Michele
    Leskovec, Jure
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [34] Temporal Graph Benchmark for Machine Learning on Temporal Graphs
    Huang, Shenyang
    Poursafaei, Farimah
    Danovitch, Jacob
    Fey, Matthias
    Hu, Weihua
    Rossi, Emanuele
    Leskovec, Jure
    Bronstein, Michael
    Rabusseau, Guillaume
    Rabbany, Reihaneh
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [35] Keepaway soccer: From machine learning testbed to benchmark
    Stone, Peter
    Kuhlmann, Gregory
    Taylor, Matthew E.
    Liu, Yaxin
    [J]. ROBOCUP 2005: ROBOT SOCCER WORLD CUP IX, 2006, 4020 : 93 - 105
  • [36] A benchmark of machine learning approaches for credit score prediction
    Moscato, Vincenzo
    Picariello, Antonio
    Sperli, Giancarlo
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 165
  • [37] A Machine Learning Benchmark with Meaning: Learnability and Verb Semantics
    Veres, Csaba
    Sandblast, Bjorn Helge
    [J]. AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 369 - 380
  • [38] LEAD TIME AND ACCURACY OF TREWS, A MACHINE LEARNING-BASED SEPSIS ALERT
    Saria, Suchi
    Henry, Katharine
    Soleimani, Hossein
    Adams, Roy
    Zhan, Andong
    Rawat, Nishi
    Chen, Edward
    Wu, Albert
    [J]. CRITICAL CARE MEDICINE, 2022, 50 (01) : 717 - 717
  • [39] Analyzing EEG Data with Machine and Deep Learning: A Benchmark
    Avola, Danilo
    Cascio, Marco
    Cinque, Luigi
    Fagioli, Alessio
    Foresti, Gian Luca
    Marini, Marco Raoul
    Pannone, Daniele
    [J]. IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT I, 2022, 13231 : 335 - 345
  • [40] Modern Machine Learning as a Benchmark for Fitting Neural Responses
    Benjamin, Ari S.
    Fernandes, Hugo L.
    Tomlinson, Tucker
    Ramkumar, Pavan
    VerSteeg, Chris
    Chowdhury, Raeed H.
    Miller, Lee E.
    Kording, Konrad P.
    [J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2018, 12