Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

被引:0
|
作者
Gujarati, Arpan [1 ]
Karimi, Reza [2 ]
Alzayat, Safya [1 ]
Hao, Wei [1 ]
Kaufmann, Antoine [1 ]
Vigfusson, Ymir [2 ]
Mace, Jonathan [1 ]
机构
[1] Max Planck Inst Software Syst, Saarbrucken, Germany
[2] Emory Univ, Atlanta, GA 30322 USA
基金
美国国家科学基金会;
关键词
TAIL;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable-on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100 ms latency targets for 99.9999% of requests. We further demonstrate that Clockwork exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.
引用
收藏
页码:443 / 462
页数:20
相关论文
共 50 条
  • [1] Performance management from the bottom up
    Gassner, Drorit
    Gofen, Anat
    Raaphorst, Nadine
    PUBLIC MANAGEMENT REVIEW, 2022, 24 (01) : 106 - 123
  • [2] Serving up the adverts you like
    不详
    IEE REVIEW, 2005, 51 (10): : 21 - 21
  • [3] Bottom-up sentiment and return predictability of the market portfolio
    Guo, Jiaqi
    Li, Youwei
    Zheng, Min
    FINANCE RESEARCH LETTERS, 2019, 29 : 57 - 60
  • [4] A bottom-up approach dramatically increases the predictability of body mass from personality traits
    Arumae, Kadri
    Vainik, Uku
    Mottus, Rene
    PLOS ONE, 2024, 19 (01):
  • [5] Cue predictability does not modulate bottom-up attentional capture
    Meijs, Erik L.
    Klaassen, Felix H.
    Bokeria, Levan
    van Gaal, Simon
    de Lange, Floris P.
    ROYAL SOCIETY OPEN SCIENCE, 2018, 5 (10):
  • [6] From the bottom up
    Garfunkel, S
    Fresh Start for Collegiate Mathematics: Rethinking the Courses below Calculus, 2006, 69 : 345 - 347
  • [7] 'FROM THE BOTTOM UP'
    WILSON, LA
    GEORGIA REVIEW, 1981, 35 (01): : 81 - 90
  • [8] 'FROM THE BOTTOM UP'
    WILSON, LA
    GEORGIA REVIEW, 1986, 40 (01): : 329 - 338
  • [9] From the bottom up
    Greenawalt, K
    CORNELL LAW REVIEW, 1997, 82 (05) : 994 - 1038
  • [10] Bottom-up action modeling via spatial factorization for serving food
    Kawasaki, Yosuke
    Takahashi, Masaki
    ADVANCED ROBOTICS, 2021, 35 (12) : 756 - 770