Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

被引:5
|
作者
Wu, Yongji [1 ]
Lentz, Matthew [1 ]
Zhuo, Danyang [1 ]
Lu, Yao [2 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] Microsoft Res, Redmond, WA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2022年 / 16卷 / 03期
关键词
68;
D O I
10.14778/3570690.3570692
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent AutoML efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of model variants with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a system for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level objectives (e.g., throughput, accuracy), JellyBean picks the most cost-efficient models that meet the accuracy target and decides how to deploy them across different tiers of infrastructures. Evaluations show that JellyBean reduces the total serving cost of visual question answering by up to 58% and vehicle tracking from the NVIDIA AI City Challenge by up to 36%, compared with state-of-the-art model selection and worker assignment solutions. JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud) up to 5x in serving costs.
引用
收藏
页码:406 / 419
页数:14
相关论文
共 50 条
  • [1] Serving Machine Learning Inference Using Heterogeneous Hardware
    Li, Baolin
    Gadepally, Vijay
    Samsi, Siddharth
    Veillette, Mark
    Tiwari, Devesh
    2021 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2021,
  • [2] Orchestration and analysis of decentralized workflows within heterogeneous networking infrastructures
    Macker, Joseph P.
    Taylor, Ian
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 75 : 388 - 401
  • [3] Optimizing Distributed Computing Workflows in Heterogeneous Network Environments
    Gu, Yi
    Wu, Qishi
    DISTRIBUTED COMPUTING AND NETWORKING, PROCEEDINGS, 2010, 5935 : 142 - 154
  • [4] Machine Learning practices and infrastructures
    Berman, Glen
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 466 - 481
  • [5] The role of machine learning in scientific workflows
    Deelman, Ewa
    Mandal, Anirban
    Jiang, Ming
    Sakellariou, Rizos
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (06): : 1128 - 1139
  • [6] Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records
    Sahoo, Satya S.
    Kobow, Katja
    Zhang, Jianzhe
    Buchhalter, Jeffrey
    Dayyani, Mojtaba
    Upadhyaya, Dipak P.
    Prantzalos, Katrina
    Bhattacharjee, Meenakshi
    Blumcke, Ingmar
    Wiebe, Samuel
    Lhatoo, Samden D.
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [7] Ontology-based feature engineering in machine learning workflows for heterogeneous epilepsy patient records
    Satya S. Sahoo
    Katja Kobow
    Jianzhe Zhang
    Jeffrey Buchhalter
    Mojtaba Dayyani
    Dipak P. Upadhyaya
    Katrina Prantzalos
    Meenakshi Bhattacharjee
    Ingmar Blumcke
    Samuel Wiebe
    Samden D. Lhatoo
    Scientific Reports, 12
  • [8] A Unified Approach to Optimizing Performance in Networks serving Heterogeneous Flows
    Li, Ruogu
    Ying, Lei
    Eryilmaz, Atilla
    Shroff, Ness B.
    IEEE INFOCOM 2009 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-5, 2009, : 253 - +
  • [9] Integrating high-performance computing, machine learning, data management workflows, and infrastructures for multiscale simulations and nanomaterials technologies
    Le Piane, Fabio
    Vozza, Mario
    Baldoni, Matteo
    Mercuri, Francesco
    BEILSTEIN JOURNAL OF NANOTECHNOLOGY, 2024, 15 : 1498 - 1521
  • [10] A Unified Approach to Optimizing Performance in Networks Serving Heterogeneous Flows
    Li, Ruogu
    Eryilmaz, Atilla
    Ying, Lei
    Shroff, Ness B.
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2011, 19 (01) : 223 - 236