Ditto: Efficient Serverless Analytics with Elastic Parallelism

被引:5
|
作者
Jin, Chao [1 ]
Zhang, Zili [1 ]
Xiang, Xingyu [1 ]
Zou, Songyun [1 ]
Huang, Gang [1 ]
Liu, Xuanzhe [1 ]
Jin, Xin [1 ]
机构
[1] Peking Univ, Beijing, Peoples R China
关键词
Serverless computing; data analytics; task scheduling;
D O I
10.1145/3603269.3604816
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Serverless computing provides fine-grained resource elasticity for data analytics-a job can flexibly scale its resources for each stage, instead of sticking to a fixed pool of resources throughout its lifetime. Due to different data dependencies and different shuffling overheads caused by intra- and inter-server communication, the best degree of parallelism (DoP) for each stage varies based on runtime conditions. We present Ditto, a job scheduler for serverless analytics that leverages fine-grained resource elasticity to optimize for job completion time (JCT) and cost. The key idea of Ditto is to use a new scheduling granularity-stage group-to decouple parallelism configuration from function placement. Ditto bundles stages into stage groups based on their data dependencies and IO characteristics. It exploits the parallelized time characteristics of the stages to determine the parallelism configuration, and prioritizes the placement of stage groups with large shuffling traffic, so that the stages in these groups can leverage zero-copy intra-server communication for efficient shuffling. We build a system prototype of Ditto and evaluate it with a variety of benchmarking workloads. Experimental results show that Ditto outperforms existing solutions by up to 2.5x on JCT and up to 1.8x on cost.
引用
收藏
页码:406 / 419
页数:14
相关论文
共 50 条
  • [31] Graph Analytics Through Fine-Grained Parallelism
    Shang, Zechao
    Li, Feifei
    Yu, Jeffrey Xu
    Zhang, Zhiwei
    Cheng, Hong
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 463 - 478
  • [32] Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime
    Sbirlea, Alina
    Agrawal, Kunal
    Sarkar, Vivek
    EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 491 - 503
  • [33] A Seer Knows Best: Optimized Object Storage Shuffling for Serverless Analytics
    Sanchez-Artigas, Marc
    Eizaguirre, German T.
    PROCEEDINGS OF THE TWENTY-THIRD ACM/IFIP INTERNATIONAL MIDDLEWARE CONFERENCE, MIDDLEWARE 2022, 2022, : 148 - 160
  • [34] MinFlow: High-performance and Cost-efficient Data Passing for I/O-intensive Stateful Serverless Analytics
    Li, Tao
    Li, Yongkun
    Zhu, Wenzhe
    Xu, Yinlong
    Lui, John C. S.
    PROCEEDINGS OF THE 21ST USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 24, 2024, : 311 - 327
  • [35] A Serverless Real-Time Data Analytics Platform for Edge Computing
    Nastic, Stefan
    Rausch, Thomas
    Scekic, Ognjen
    Dustdar, Schahram
    Gusev, Marjan
    Koteska, Bojana
    Kostoska, Magdalena
    Jakimovski, Boro
    Ristov, Sasko
    Prodan, Radu
    IEEE INTERNET COMPUTING, 2017, 21 (04) : 64 - 71
  • [36] MinFlow: High-performance and Cost-efficient Data Passing for I/O-intensive Stateful Serverless Analytics
    Li, Tao
    Li, Yongkun
    Zhu, Wenzhe
    Xu, Yinlong
    Lui, John C. S.
    PROCEEDINGS OF THE 22ND USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES, FAST 24, 2024, : 311 - 327
  • [37] Cherry: A Distributed Task-Aware Shuffle Service for Serverless Analytics
    Nikitas, Nikolaos
    Konstantinou, Ioannis
    Kalogeraki, Vana
    Koziris, Nectarios
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 120 - 130
  • [38] Proactive Elastic Scheduling for Serverless Ensemble Inference Services
    He, Shikun
    Feng, Binbin
    Ding, Zhijun
    2024 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2024, 2024, : 1025 - 1035
  • [39] EMARS: Efficient Management and Allocation of Resources in Serverless
    Saha, Aakanksha
    Jindal, Sonika
    PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 827 - 830
  • [40] An Efficient Authentication Protocol for Serverless RFID Systems
    Ge Zhi-jun
    Hao Yong-sheng
    PROCEEDINGS OF 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, 2012, : 640 - 644