Hercules: Heterogeneity-Aware Inference Serving for At-Scale Personalized Recommendation

被引:8
|
作者
Ke, Liu [1 ,2 ]
Gupta, Udit [1 ,3 ]
Hempsteadis, Mark [4 ]
Wu, Carole-Jean [1 ]
Lee, Hsien-Hsin S. [1 ]
Zhang, Xuan [2 ]
机构
[1] Meta, Menlo Pk, CA 94025 USA
[2] Washington Univ St Louis, St Louis, MO 63130 USA
[3] Harvard Univ, Cambridge, MA 02138 USA
[4] Tufts Univ, Medford, MA 02155 USA
关键词
D O I
10.1109/HPCA53966.2022.00019
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Personalized recommendation is an important class of deep-learning applications that powers a large collection of internet services and consumes a considerable amount of datacenter resources. As the scale of production-grade recommendation systems continues to grow, optimizing their serving performance and efficiency in a heterogeneous datacenter is important and can translate into infrastructure capacity saving. In this paper, we propose Hercules, an optimized framework for personalized recommendation inference serving that targets diverse industry-representative models and cloud-scale heterogeneous systems. Hercules performs a two-stage optimization procedure - offline profiling and online serving. The first stage searches the large under-explored task scheduling space with a gradient-based search algorithm achieving up to 9.0x latency-bounded throughput improvement on individual servers; it also identifies the optimal heterogeneous server architecture for each recommendation workload. The second stage performs heterogeneity-aware cluster provisioning to optimize resource mapping and allocation in response to fluctuating diurnal loads. The proposed cluster scheduler in Hercules achieves 47.7% cluster capacity saving and reduces the provisioned power by 23.7% over a state-of-the-art greedy scheduler.
引用
收藏
页码:141 / 154
页数:14
相关论文
共 11 条
  • [1] FLASH: Heterogeneity-Aware Federated Learning at Scale
    Yang, Chengxu
    Xu, Mengwei
    Wang, Qipeng
    Chen, Zhenpeng
    Huang, Kang
    Ma, Yun
    Bian, Kaigui
    Huang, Gang
    Liu, Yunxin
    Jin, Xin
    Liu, Xuanzhe
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (01) : 483 - 500
  • [2] Heterogeneity-aware and communication-efficient distributed statistical inference
    Duan, Rui
    Ning, Yang
    Chen, Yong
    [J]. BIOMETRIKA, 2022, 109 (01) : 67 - 83
  • [3] Heterogeneity-aware Cross-school Electives Recommendation: a Hybrid Federated Approach
    Ju, Chengyi
    Cao, Jiannong
    Yang, Yu
    Yang, Zhen-Qun
    Lee, Ho Man
    [J]. 2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 1500 - 1508
  • [4] Personalized Heterogeneity-aware Federated Search Towards Better Accuracy and Energy Efficiency
    Yang, Zhao
    Sun, Qingshuang
    [J]. 2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [5] HedgeRank: Heterogeneity-Aware, Energy-Efficient Partitioning of Personalized PageRank at the Edge
    Gong, Young-Ho
    [J]. MICROMACHINES, 2023, 14 (09)
  • [6] DeepRecSys: A System for Optimizing End-To-End At-Scale Neural Recommendation Inference
    Gupta, Udit
    Hsia, Samuel
    Saraph, Vikram
    Wang, Xiaodong
    Reagen, Brandon
    Wei, Gu-Yeon
    Lee, Hsien-Hsin S.
    Brooks, David
    Wu, Carole-Jean
    [J]. 2020 ACM/IEEE 47TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2020), 2020, : 982 - 995
  • [7] FedCure: A Heterogeneity-Aware Personalized Federated Learning Framework for Intelligent Healthcare Applications in IoMT Environments
    Sachin, D. N.
    Annappa, B.
    Hegde, Saumya
    Abhijit, Chunduru Sri
    Ambesange, Sateesh
    [J]. IEEE ACCESS, 2024, 12 : 15867 - 15883
  • [8] Joint heterogeneity-aware personalized federated search for energy efficient battery-powered edge computing
    Yang, Zhao
    Zhang, Shengbing
    Li, Chuxi
    Wang, Miao
    Yang, Jiaying
    Zhang, Meng
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 146 : 178 - 194
  • [9] HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving
    Mo, Hao
    Zhu, Ligu
    Shi, Lei
    Tan, Songfu
    Wang, Suping
    [J]. ELECTRONICS, 2023, 12 (01)
  • [10] Context-aware personalized path inference from large-scale GPS snippets
    Wang, Hongtao
    Wang, Hongmei
    Yi, Feng
    Wen, Hui
    Li, Gang
    Sun, Limin
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 : 78 - 88