A lightweight performance proxy for deep-learning model training on Amazon SageMaker

被引:0
|
作者
Tesser, Rafael Keller [1 ,2 ,3 ]
Marques, Alvaro [2 ]
Borin, Edson [2 ]
机构
[1] Univ Campinas Unicamp, Ctr Comp Engn & Sci, Sao Paulo, Brazil
[2] Univ Campinas Unicamp, Inst Comp, Sao Paulo, Brazil
[3] Fed Univ Technol Parana UTFPR, Bachelors Course Comp Sci, Santa Helena, PR, Brazil
来源
关键词
cloud computing; cost prediction; deep learning; machine learning; performance prediction;
D O I
10.1002/cpe.8104
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cloud computing has become popular for training deep-learning (DL) models, avoiding the costs of acquiring and maintaining on-premise systems. SageMaker is a cloud service that automates the execution of DL workloads. Its features include automatic hyperparameter optimization and use of spot instances. Nonetheless, it does not assist in selecting the right instance type for a workload. In public clouds, rent price depends on the configuration of the chosen instance type. Advanced and faster instances are typically more expensive, but not always the best choice. To select the optimal instance type, users must compare the workload's relative performance (and hence cost) on several candidates. Building on the execution profiles of multiple DL applications, we model the performance and cost of training DL applications on SageMaker and propose a lightweight technique to estimate these at low temporal and monetary cost. This method is a performance proxy that can be used to replace more expensive performance measurement procedures. So, it could speed up any technique that relies on such measurements. We show how it can help cloud customers seeking suitable instance types to train DL models, and that it can accurately predict the performance of different instance types when training these models on SageMaker.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Put Deep Learning to Work: Accelerate Deep Learning through Amazon SageMaker and ML Services
    Ye, Wenming
    Hu, Rachel
    Enev, Miro
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3496 - 3496
  • [2] Selecting efficient VM types to train deep learning models on Amazon SageMaker
    Tesser, Rafael Keller
    Marques, Alvaro
    Borin, Edson
    2021 IEEE 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW 2021), 2021, : 20 - 27
  • [4] A lightweight deep-learning model for parasite egg detection in microscopy images
    Wenbin Xu
    Qiang Zhai
    Jizhong Liu
    Xingyu Xu
    Jing Hua
    Parasites & Vectors, 17 (1)
  • [5] Deep-Learning performance for Digital Terrain Model generation
    Knyaz, Vladimir
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXIV, 2018, 10789
  • [6] Lightweight and accurate aphid detection model based on an improved deep-learning network
    Sun, Weihai
    Li, Yane
    Feng, Hailin
    Weng, Xiang
    Ruan, Yaoping
    Fang, Kai
    Huang, Leijun
    ECOLOGICAL INFORMATICS, 2024, 83
  • [7] Fast training method of deep-learning model fused with prior knowledge
    Wang P.
    He M.
    Wang H.
    Harbin Gongcheng Daxue Xuebao/Journal of Harbin Engineering University, 2021, 42 (04): : 561 - 566
  • [8] LightFEC: Network Adaptive FEC with a Lightweight Deep-Learning Approach
    Hu, Han
    Cheng, Sheng
    Zhang, Xinggong
    Guo, Zongming
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3592 - 3600
  • [9] A Deep-Learning P-Wave Arrival Picker for Laboratory Acoustic Emissions: Model Training and Its Performance
    Tian Yang Guo
    Tiziana Vanorio
    Jihui Ding
    Rock Mechanics and Rock Engineering, 2025, 58 (3) : 3073 - 3091
  • [10] A Deep-Learning Model for Cancer Therapies
    不详
    CLINICAL PHARMACOLOGY & THERAPEUTICS, 2021, 109 (02) : 284 - 284