The Workflow Trace Archive: Open-Access Data From Public and Private Computing Infrastructures

被引:22
|
作者
Versluis, Laurens [1 ]
Matha, Roland [2 ]
Talluri, Sacheendra [1 ]
Hegeman, Tim [1 ]
Prodan, Radu [3 ]
Deelman, Ewa [4 ]
Iosup, Alexandru [1 ]
机构
[1] Vrije Univ Amsterdam, Comp Sci, NL-1081 HV Amsterdam, Netherlands
[2] Univ Innsbruck, Inst Comp Sci, A-6020 Innsbruck, Tyrol, Austria
[3] Univ Klagenfurt, Inst Software Technol, A-9020 Klagenfurt Am, Austria
[4] Univ Southern Calif, Informat Sci Inst, Los Angeles, CA 90292 USA
基金
欧盟地平线“2020”; 美国国家科学基金会;
关键词
Workflow; open-source; open-access; traces; characterization; archive; survey; simulation; CLOUD; FUTURE;
D O I
10.1109/TPDS.2020.2984821
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. In this work, we focus on traces of workflows-common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes ${>}48$>48 million workflows captured from ${>}10$>10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.
引用
收藏
页码:2170 / 2184
页数:15
相关论文
共 50 条
  • [1] Rethinking the Data Wheel: Automating Open-Access, Public Data on Cyber Conflict
    Whyte, Christopher
    Valeriano, Brandon
    Jensen, Benjamin
    Maness, Ryan
    [J]. 2018 10TH INTERNATIONAL CONFERENCE ON CYBER CONFLICT (CYCON X): MAXIMISING EFFECTS, 2018, : 9 - 30
  • [2] A nod to public open access infrastructures
    Fecher, Benedikt
    Friesike, Sascha
    Wagner, Gert G.
    [J]. SCIENCE, 2017, 356 (6344) : 1242 - 1242
  • [3] Open-access public-private partnerships to enable drug discovery - New approaches
    Mueller, Susanne
    Weigelt, Johan
    [J]. IDRUGS, 2010, 13 (03) : 175 - 180
  • [4] Tap into the joy of open-access data
    AlRyalat, Saif Aldeen
    [J]. NATURE, 2018, 563 (7730) : 184 - 184
  • [5] The Cancer Imaging Archive: Supporting Radiomic and Imaging Genomic Research with Open-Access Data Sets
    Kirby, J.
    Tarbox, L.
    Freymann, J.
    Jaffe, C.
    Prior, F.
    [J]. MEDICAL PHYSICS, 2015, 42 (06) : 3587 - 3587
  • [6] Geis Digital Archive: An Open-Access Educational Resource for Structural Biology
    Markosian, Christopher
    Lin, Belle
    Burley, Stephen
    Zardecki, Christine
    Alvarado, Alexander
    Werpachowski, Nicole
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 252
  • [7] Open-Access Geospatial Data: Promise and Potential
    Blatt, Amy J.
    [J]. JOURNAL OF MAP & GEOGRAPHY LIBRARIES, 2016, 12 (02) : 216 - 222
  • [8] Open-access science: A necessity for global public health
    Coloma, Josefina
    Harris, Eva
    [J]. PLOS PATHOGENS, 2005, 1 (02) : 99 - 101
  • [9] From open-access to 'predatory' publishing
    Van Meerbeek, Bart
    Frankenberger, Roland
    [J]. JOURNAL OF ADHESIVE DENTISTRY, 2018, 20 (04): : 275 - 275
  • [10] From a Private Archive to a Public Museum
    Grzina, Ivana
    [J]. ZIVOT UMJETNOSTI, 2022, (111): : 76 - 85