A Seer Knows Best: Optimized Object Storage Shuffling for Serverless Analytics

被引:4
|
作者
Sanchez-Artigas, Marc [1 ]
Eizaguirre, German T. [1 ]
机构
[1] Univ Rovira & Virgili, Tarragona, Spain
关键词
Serverless computing; Shuffle; I/O optimization; Object storage;
D O I
10.1145/3528535.3565241
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Serverless platforms offer high resource elasticity and pay-as-you-go billing, making them a compelling choice for data analytics. To craft a "pure" serverless solution, the common practice is to transfer intermediate data between serverless functions via serverless object storage (IBM COS; AWS S3). However, prior works have led to inconclusive results about the performance of object storage, since they have left large margin for optimization. To verify that object storage has been underrated, we design a novel shuffle manager for serverless data analytics termed Seer. Specifically, Seer dynamically chooses between two shuffle algorithms to maximize performance. The algorithm choice is based on some predictive models, and very importantly, without users having to specify intermediate data sizes at the time of the job submission. We integrate Seer with PyWren-IBM [31], a serverless analytics framework, and evaluate it against both serverful (e.g., Spark) and serverless systems (e.g., Google BigQuery). Our results certify that our new shuffle manager can deliver performance improvements over them.
引用
收藏
页码:148 / 160
页数:13
相关论文
共 13 条
  • [1] A Seer knows best: Auto-tuned object storage shuffling for serverless analytics
    Eizaguirre, German T.
    Sanchez-Artigas, Marc
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2024, 183
  • [2] Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure
    Pu, Qifan
    Venkataraman, Shivaram
    Stoica, Ion
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, 2019, : 193 - 206
  • [3] Understanding Ephemeral Storage for Serverless Analytics
    Klimovic, Ana
    Wang, Yawen
    Kozyrakis, Christos
    Stuedi, Patrick
    Pfefferle, Jonas
    Trivedi, Animesh
    PROCEEDINGS OF THE 2018 USENIX ANNUAL TECHNICAL CONFERENCE, 2018, : 789 - 794
  • [4] Pocket: Elastic Ephemeral Storage for Serverless Analytics
    Klimovic, Ana
    Wang, Yawen
    Stuedi, Patrick
    Trivedi, Animesh
    Pfefferle, Jonas
    Kozyrakis, Christos
    PROCEEDINGS OF THE 13TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2018, : 427 - 444
  • [5] Data-Driven Serverless Functions for Object Storage
    Sampe, Josep
    Sanchez-Artigas, Marc
    Garcia-Lopez, Pedro
    Paris, Gerard
    PROCEEDINGS OF THE 2017 INTERNATIONAL MIDDLEWARE CONFERENCE (MIDDLEWARE'17), 2017, : 121 - 133
  • [6] Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics
    Nicolae, Bogdan
    Costa, Carlos
    Misale, Claudia
    Katrinis, Kostas
    Park, Yoonho
    2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, : 409 - 412
  • [7] SwiftAnalytics: Optimizing Object Storage for Big Data Analytics
    Rupprecht, Lukas
    Zhang, Rui
    Owen, Bill
    Pietzuch, Peter
    Hildebrand, Dean
    2017 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E 2017), 2017, : 245 - 251
  • [8] Is Performance of Object Storage Predictable for Serverless I/O Workloads? A Comparative Study
    Eizaguirre, German T.
    Sanchez-Artigas, Marc
    2023 IEEE 31ST INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS, ICNP, 2023,
  • [9] Exploiting Cloud Object Storage for High- Performance Analytics
    Durner, Dominik
    Leis, Viktor
    Neumann, Thomas
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (11): : 2769 - 2782
  • [10] A Combined Optimized WLAN Communication Security Algorithm Based on Distributed Object Storage
    Pu, Zaiyi
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 291 - 292