A data dependency based strategy for intermediate data storage in scientific cloud workflow systems

被引:45
|
作者
Yuan, Dong [1 ]
Yang, Yun [1 ]
Liu, Xiao [1 ]
Zhang, Gaofeng [1 ]
Chen, Jinjun [1 ]
机构
[1] Swinburne Univ Technol, Fac Informat & Commun Technol, Melbourne, Vic 3122, Australia
来源
基金
澳大利亚研究理事会;
关键词
data sets storage; cloud computing; scientific workflow;
D O I
10.1002/cpe.1636
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Many scientific workflows are data intensive where large volumes of intermediate data are generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science in the cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenance in scientific workflows. With the IDG, deleted intermediate data can be regenerated, and as such we develop a novel intermediate data storage strategy that can reduce the cost of scientific cloud workflow systems by automatically storing appropriate intermediate data sets with one cloud service provider. The strategy has significant research merits, i.e. it achieves a cost-effective trade-off of computation cost and storage cost and is not strongly impacted by the forecasting inaccuracy of data sets' usages. Meanwhile, the strategy also takes the users' tolerance of data accessing delay into consideration. We utilize Amazon's cost model and apply the strategy to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly. Copyright (C) 2010 John Wiley & Sons, Ltd.
引用
收藏
页码:956 / 976
页数:21
相关论文
共 50 条
  • [1] Optimizing data regeneration and storage with data dependency for cloud scientific workflow systems
    Fan, Lei
    Zhou, Lin
    Wang, Meijuan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [2] A data placement strategy for scientific workflow in hybrid cloud
    Liu, Zhanghui
    Xiang, Tao
    Lin, Bing
    Ye, Xinshu
    Wang, Haijiang
    Zhang, Ying
    Chen, Xing
    [J]. PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 556 - 563
  • [3] Problems with Replica Placement Using Data Dependency in Scientific Cloud Workflow
    Bhattacharya, Hindol
    Chattopadhyay, Samiran
    Chattopadhyay, Matangini
    [J]. PROCEEDINGS OF 2018 FIFTH INTERNATIONAL CONFERENCE ON EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2018,
  • [4] A Data Dependency and Access Threshold Based Replication Strategy for Multi-cloud Workflow Applications
    Xie, Fei
    Yan, Jun
    Shen, Jun
    [J]. SERVICE-ORIENTED COMPUTING, ICSOC 2018, 2019, 11434 : 281 - 293
  • [5] Experimental Analysis on CTT-SP Algorithm for Intermediate Data Storage in Scientific Workflow Systems
    Fan, Lei
    Meng, Sha
    Liang, Yanfang
    Liu, Xiyang
    [J]. 2015 11TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2015, : 458 - 461
  • [6] On-demand minimum cost benchmarking for intermediate dataset storage in scientific cloud workflow systems
    Yuan, Dong
    Yang, Yun
    Liu, Xiao
    Chen, Jinjun
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2011, 71 (02) : 316 - 332
  • [7] A Novel Workflow-Level Data Placement Strategy for Data-Sharing Scientific Cloud Workflows
    Li, Xuejun
    Zhang, Lei
    Wu, Yang
    Liu, Xiao
    Zhu, Erzhou
    Yi, Huikang
    Wang, Futian
    Zhang, Cheng
    Yang, Yun
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2019, 12 (03) : 370 - 383
  • [8] Data-Aware Scheduling Strategy for Scientific Workflow Applications in IaaS Cloud Computing
    Makhlouf, Sid Ahmed
    Yagoubi, Belabbas
    [J]. INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2019, 5 (04): : 75 - 85
  • [9] Security-aware intermediate data placement strategy in scientific cloud workflows
    Wei Liu
    Su Peng
    Wei Du
    Wei Wang
    Guo Sun Zeng
    [J]. Knowledge and Information Systems, 2014, 41 : 423 - 447
  • [10] Security-aware intermediate data placement strategy in scientific cloud workflows
    Liu, Wei
    Peng, Su
    Du, Wei
    Wang, Wei
    Zeng, Guo Sun
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2014, 41 (02) : 423 - 447