Owl: Performance-Aware Scheduling for Resource-Efficient Function-as-a-Service Cloud

被引：9

作者：

Tian, Huangshi ^{[1
]}

Li, Suyi ^{[1
]}

Wang, Ao ^{[2
,3
]}

Wang, Wei ^{[1
]}

Wu, Tianlong ^{[3
]}

Yang, Haoran ^{[3
]}

机构：

[1] HKUST, Hong Kong, Peoples R China

[2] George Mason Univ, Fairfax, VA 22030 USA

[3] Alibaba Grp, Hangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 13TH SYMPOSIUM ON CLOUD COMPUTING, SOCC 2022 | 2022年

关键词：

serverless; resource-management; scheduling; overcommitment;

D O I：

10.1145/3542929.3563470

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work documents our experience of improving the scheduler in Alibaba Function Compute, a public FaaS platform. It commences with our observation that memory and CPU are under-utilized in most FaaS sandboxes. A natural solution is to overcommit VM resources when allocating sandboxes, whereas the ensuing contention may cause performance degradation and compromise user experience. To complicate matters, the degradation in FaaS can arise from external factors, such as failed dependencies of user functions. We design Owl to achieve both high utilization and performance stability. It introduces a customizable rule system for users to specify their toleration of degradation, and overcommits resources with a dual approach. (1) For less-invoked functions, it allocates resources to the sandboxes with usage-based heuristic, keeps monitoring their performance, and remedies any detected degradation. It differentiates whether a degraded sandbox is affected externally by separating a contention-free environment and migrating the affected sandbox into there as a comparison baseline. (2) For frequently-invoked functions, Owl profiles the interference patterns among collocated sandboxes and place the sandboxes under the guidance of profiles. The collocation profiling is designed to tackle the constraints that profiling has to be conducted in production. Owl further consolidates idle sandboxes to reduce resource waste. We prototype Owl in our production system and implement a representative benchmark suite to evaluate it. The results demonstrate that the prototype could reduce VM cost by 43.80% and effectively mitigate latency degradation, with negligible overhead incurred.

引用

页码：78 / 93

页数：16

共 50 条

[21] A Resource-Efficient Predictive Resource Provisioning System in Cloud Systems
Shen, Haiying
Chen, Liuhua
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (12) : 3886 - 3900
[22] Towards a Security-Aware Benchmarking Framework for Function-as-a-Service
Pellegrini, Roland
Ivkic, Igor
Tauber, Markus
[J]. CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 666 - 669
[23] SPO: A Secure and Performance-aware Optimization for MapReduce Scheduling
Maleki, Neda
Rahmani, Amir Masoud
Conti, Mauro
[J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2021, 176
[24] Performance of Java']Java in Function-as-a-Service Computing
Wu, Qinzhe
John, Lizy K.
[J]. 2022 IEEE/ACM 15TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC, 2022, : 261 - 266
[25] Reliability-Oriented and Resource-Efficient Service Function Chain Construction and Backup
Wang, Ying
Zhang, Leyi
Yu, Peng
Chen, Ke
Qiu, Xuesong
Meng, Luoming
Kadoch, Michel
Cheriet, Mohamed
[J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2021, 18 (01): : 240 - 257
[26] Towards an Energy Efficient Computing With Coordinated Performance-Aware Scheduling in Large Scale Data Clusters
Hamandawana, Prince
Mativenga, Ronnie
Kwon, Se Jin
Chung, Tae-Sun
[J]. IEEE ACCESS, 2019, 7 : 140261 - 140277
[27] Serverledge: Decentralized Function-as-a-Service for the Edge-Cloud Continuum
Russo, Gabriele Russo
Mannucci, Tiziana
Cardellini, Valeria
Lo Presti, Francesco
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS, PERCOM, 2023, : 131 - 140
[28] FaaSFlow: Enable Efficient Workflow Execution for Function-as-a-Service
Li, Zijun
Liu, Yushi
Guo, Linsong
Chen, Quan
Cheng, Jiagan
Zheng, Wenli
Guo, Minyi
[J]. ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 782 - 796
[29] Efficient Load-Balancing Aware Cloud Resource Scheduling for Mobile User
Li Chunlin
Zhou Min
Luo Youlong
[J]. COMPUTER JOURNAL, 2017, 60 (06): : 925 - 939
[30] Resource-Efficient and Availability-Aware Service Chaining and VNF Placement with VNF Diversity and Redundancy
Hara, Takanori
Sasabe, Masahiro
Sugihara, Kento
Kasahara, Shoji
[J]. IEICE TRANSACTIONS ON COMMUNICATIONS, 2024, E107B (01) : 105 - 116

← 1 2 3 4 5 →