The Workflow Trace Archive: Open-Access Data From Public and Private Computing Infrastructures

被引:22
|
作者
Versluis, Laurens [1 ]
Matha, Roland [2 ]
Talluri, Sacheendra [1 ]
Hegeman, Tim [1 ]
Prodan, Radu [3 ]
Deelman, Ewa [4 ]
Iosup, Alexandru [1 ]
机构
[1] Vrije Univ Amsterdam, Comp Sci, NL-1081 HV Amsterdam, Netherlands
[2] Univ Innsbruck, Inst Comp Sci, A-6020 Innsbruck, Tyrol, Austria
[3] Univ Klagenfurt, Inst Software Technol, A-9020 Klagenfurt Am, Austria
[4] Univ Southern Calif, Informat Sci Inst, Los Angeles, CA 90292 USA
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
Workflow; open-source; open-access; traces; characterization; archive; survey; simulation; CLOUD; FUTURE;
D O I
10.1109/TPDS.2020.2984821
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. In this work, we focus on traces of workflows-common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes ${>}48$>48 million workflows captured from ${>}10$>10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.
引用
收藏
页码:2170 / 2184
页数:15
相关论文
共 50 条
  • [21] Advantages of a Truly Open-Access Data-Sharing Model
    Bertagnolli, Monica M.
    Sartor, Oliver
    Chabner, Bruce A.
    Rothenberg, Mace L.
    Khozin, Sean
    Hugh-Jones, Charles
    Reese, David M.
    Murphy, Martin J.
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2017, 376 (12): : 1178 - 1181
  • [22] Advances in Studying Brain Morphology: The Benefits of Open-Access Data
    Madan, Christopher R.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2017, 11
  • [23] AN APPROACH TO PUBLISH STATISTICS FROM OPEN-ACCESS JOURNALS USING LINKED DATA TECHNOLOGIES
    Hallo, M.
    Lujan-Mora, S.
    Trujillo, J.
    [J]. INTED2015: 9TH INTERNATIONAL TECHNOLOGY, EDUCATION AND DEVELOPMENT CONFERENCE, 2015, : 5940 - 5948
  • [24] A peer-reviewed, open-access publication of the R Foundation for Statistical Computing
    Kane, Michael J.
    [J]. R JOURNAL, 2020, 12 (01): : 4 - 5
  • [25] Reassessing the lithosphere: SeisDARE, an open-access seismic data repository
    DeFelipe, Irene
    Alcalde, Juan
    Ivandic, Monika
    Marti, David
    Ruiz, Mario
    Marzan, Ignacio
    Diaz, Jordi
    Ayarza, Puy
    Palomeras, Imma
    Fernandez-Turiel, Jose-Luis
    Molina, Cecilia
    Bernal, Isabel
    Brown, Larry
    Roberts, Roland
    Carbonell, Ramon
    [J]. EARTH SYSTEM SCIENCE DATA, 2021, 13 (03) : 1053 - 1071
  • [26] A peer-reviewed, open-access publication of the R Foundation for Statistical Computing
    Verzani, John
    [J]. R JOURNAL, 2018, 10 (01): : 4 - 4
  • [27] A peer-reviewed, open-access publication of the R Foundation for Statistical Computing
    Lawrence, Michael
    [J]. R JOURNAL, 2016, 8 (01): : 4 - 4
  • [28] A peer-reviewed, open-access publication of the R Foundation for Statistical Computing
    Turner, Heather
    [J]. R JOURNAL, 2011, 3 (02): : 3 - 3
  • [29] Societies warn of damage from open-access plan
    Banks, Michael
    [J]. PHYSICS WORLD, 2019, 32 (03) : 15 - 15
  • [30] Dynamic Task Scheduling in Remote Sensing Data Acquisition from Open-Access Data Using CloudSim
    Wang, Zhibao
    Bai, Lu
    Liu, Xiaogang
    Chen, Yuanlin
    Zhao, Man
    Tao, Jinhua
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (22):