The globus compute dataset: An open function-as-a-service dataset from the edge to the cloud

被引:7
|
作者
Bauer, Andre [1 ,2 ]
Pan, Haochen [1 ]
Chard, Ryan [2 ]
Babuji, Yadu [1 ]
Bryan, Josh [1 ]
Tiwari, Devesh [3 ]
Foster, Ian [1 ,2 ]
Chard, Kyle [1 ,2 ]
机构
[1] Univ Chicago, Chicago, IL 60637 USA
[2] Argonne Natl Lab, Argonne, IL USA
[3] Northeastern Univ, Boston, MA 02138 USA
基金
美国国家科学基金会;
关键词
Serverless computing; Globus compute; FAIR dataset; Computing continuum;
D O I
10.1016/j.future.2023.12.007
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a unique function -as -a -service (FaaS) dataset capturing the use of the Globus Compute (previously funcX) platform. Globus Compute implements a federated model via which users may deploy endpoints on arbitrary remote computers, from the edge to high performance computing (HPC) cluster, and they may then invoke Python functions on those endpoints via a reliable cloud -hosted service. The dataset covers 31 weeks and includes 2121472 task submissions from 252 users executed on 580 remote computing endpoints. It includes 277386 registered functions. We describe the dataset and various observations, some that are similar to other FaaS datasets, for example, that 74% of tasks run for less than 1 s, and some that are unique to Globus Compute, for example, that endpoints are used in different ways and that the majority of functions are related to scientific computing and machine learning. To the best of our knowledge, this dataset represents the first federated FaaS dataset that includes user workloads, distributed computing endpoints, and analysis of registered function bodies. We expect the dataset to be useful for researching FaaS architectures, workload modeling, container warming, and other distributed computing architectures.
引用
收藏
页码:558 / 574
页数:17
相关论文
共 50 条
  • [41] Open access EEG dataset of repeated measurements from a single subject for microstate analysis
    Liu, Qi
    Jia, Shuyong
    Tu, Na
    Zhao, Tianyi
    Lyu, Qiuyue
    Liu, Yuhan
    Song, Xiaojing
    Wang, Shuyou
    Zhang, Weibo
    Xiong, Feng
    Zhang, Hecheng
    Guo, Yi
    Wang, Guangjun
    SCIENTIFIC DATA, 2024, 11 (01)
  • [42] Introduction to an Open Community Infrasound Dataset from the Actively Erupting Sakurajima Volcano, Japan
    Fee, David
    Yokoo, Akihiko
    Johnson, Jeffrey B.
    SEISMOLOGICAL RESEARCH LETTERS, 2014, 85 (06) : 1151 - 1162
  • [43] A new open dataset from a milling process – data for classification and estimation of tool life
    Grzegorz Piecuch
    Tomasz Żabiński
    Scientific Data, 12 (1)
  • [44] Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation
    Raducu, Razvan
    Esteban, Gonzalo
    Rodriguez Lera, Francisco J.
    Fernandez, Camino
    APPLIED SCIENCES-BASEL, 2020, 10 (04):
  • [45] An Efficient Algorithm for Service Function Chains Reconfiguration in Mobile Edge Cloud Networks
    Li, Biyi
    Cheng, Bo
    Chen, Junliang
    2021 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, ICWS 2021, 2021, : 426 - 435
  • [46] Trends in U.S. Total Cloud Cover from a Homogeneity-Adjusted Dataset
    Free, Melissa
    Sun, Bomin
    JOURNAL OF CLIMATE, 2014, 27 (13) : 4959 - 4969
  • [47] Dataset and benchmark for as-built BIM reconstruction from real-world point cloud
    Liu, Yudong
    Huang, Han
    Gao, Ge
    Ke, Ziyi
    Li, Shengtao
    Gu, Ming
    AUTOMATION IN CONSTRUCTION, 2025, 173
  • [48] A Dataset of Overshooting Cloud Top from 12-Year CloudSat/CALIOP Joint Observations
    Li, Haoyang
    Wei, Xiaocheng
    Min, Min
    Li, Bo
    Nong, Ziqi
    Chen, Lin
    REMOTE SENSING, 2022, 14 (10)
  • [49] Cloud vertical structure and its variations from a 20-yr global rawinsonde dataset
    Wang, JH
    Rossow, WB
    Zhang, YC
    JOURNAL OF CLIMATE, 2000, 13 (17) : 3041 - 3056
  • [50] Efficient Structure from Motion for Large-Size Videos from an Open Outdoor UAV Dataset
    Xiang, Ruilin
    Chen, Jiagang
    Ji, Shunping
    SENSORS, 2024, 24 (10)