Serving DNNs like Clockwork: Performance Predictability from the Bottom Up

被引：0

作者：

Gujarati, Arpan ^{[1
]}

Karimi, Reza ^{[2
]}

Alzayat, Safya ^{[1
]}

Hao, Wei ^{[1
]}

Kaufmann, Antoine ^{[1
]}

Vigfusson, Ymir ^{[2
]}

Mace, Jonathan ^{[1
]}

机构：

[1] Max Planck Inst Software Syst, Saarbrucken, Germany

[2] Emory Univ, Atlanta, GA 30322 USA

来源：

PROCEEDINGS OF THE 14TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI '20) | 2020年

基金：

美国国家科学基金会;

关键词：

TAIL;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Machine learning inference is becoming a core building block for interactive web applications. As a result, the underlying model serving systems on which these applications depend must consistently meet low latency targets. Existing model serving architectures use well-known reactive techniques to alleviate common-case sources of latency, but cannot effectively curtail tail latency caused by unpredictable execution times. Yet the underlying execution times are not fundamentally unpredictable-on the contrary we observe that inference using Deep Neural Network (DNN) models has deterministic performance. Here, starting with the predictable execution times of individual DNN inferences, we adopt a principled design methodology to successively build a fully distributed model serving system that achieves predictable end-to-end performance. We evaluate our implementation, Clockwork, using production trace workloads, and show that Clockwork can support thousands of models while simultaneously meeting 100 ms latency targets for 99.9999% of requests. We further demonstrate that Clockwork exploits predictable execution times to achieve tight request-level service-level objectives (SLOs) as well as a high degree of request-level performance isolation.

引用

页码：443 / 462

页数：20

共 50 条

[41] UP FROM BOTTOM IN FRANKLINS PHILADELPHIA
NASH, GB
PAST & PRESENT, 1977, (77) : 57 - 83
[42] MENTAL REPRESENTATION FROM THE BOTTOM UP
LLOYD, D
JOURNAL OF PHILOSOPHY, 1984, 81 (11): : 729 - 729
[43] Building Cells from the Bottom Up
Gerhard, Danielle
SCIENTIST, 2024, 38 (02): : 25 - 27
[44] Detecting rainfall from the bottom up
Nature, 2014, 509 (7500) : 262 - 263
[45] Holey graphene from the bottom up
Sealy, Cordelia
NANO TODAY, 2018, 21 : 7 - 8
[46] Reappraising Maoism from the Bottom Up
Pieragastini, Steven
TWENTIETH-CENTURY CHINA, 2021, 46 (03) : 316 - 322
[47] Nanoribbons from the bottom-up
C. Scott Hartley
Nature Chemistry, 2014, 6 : 91 - 92
[48] Big Data from the bottom up
Couldry, Nick
Powell, Alison
BIG DATA & SOCIETY, 2014, 1 (02):
[49] Art History from the Bottom Up
Rubin, Patricia
ART HISTORY, 2013, 36 (02) : 280 - 309
[50] The biology of consciousness from the bottom up
Braun, Claude M. J.
Lovejoy, Shaun
ADAPTIVE BEHAVIOR, 2018, 26 (03) : 91 - 109

← 1 2 3 4 5 →