Data placement for scientific applications in distributed environments

被引:0
|
作者
Chervenak, Ann [1 ]
Deelman, Ewa [1 ]
Livny, Miron [2 ]
Su, Mei-Hui [1 ]
Schuler, Rob [1 ]
Bharathi, Shishir [1 ]
Mehta, Gaurang [1 ]
Vahi, Karan [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.
引用
收藏
页码:146 / +
页数:2
相关论文
共 50 条
  • [31] A Novel Data Placement Strategy for Data-Sharing Scientific Workflows in Heterogeneous Edge-Cloud Computing Environments
    Du, Xin
    Tang, Songtao
    Lu, Zhihui
    Wu, Jie
    Gai, Keke
    Hung, Patrick C. K.
    2020 IEEE 13TH INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS 2020), 2020, : 498 - 507
  • [32] Genetic Based Data Placement for Geo-Distributed Data-Intensive Applications in Cloud Computing
    Fan, Weifeng
    Peng, Jun
    Zhang, Xiaoyong
    Huang, Zhiwu
    ADVANCES IN SERVICES COMPUTING, 2016, 10065 : 253 - 265
  • [33] Location-aware Associated Data Placement for Geo-distributed Data-intensive Applications
    Yu, Boyang
    Pan, Jianping
    2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), 2015,
  • [34] Scientific Workflows in IoT Environments: A Data Placement Strategy Based on Heterogeneous Edge-Cloud Computing
    Du, Xin
    Tang, Songtao
    Lu, Zhihui
    Gai, Keke
    Wu, Jie
    Hung, Patrick C. K.
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2022, 13 (04)
  • [35] Applications of dynamic proxies in distributed environments
    Hassoun, Y
    Johnson, R
    Counsell, S
    SOFTWARE-PRACTICE & EXPERIENCE, 2005, 35 (01): : 75 - 99
  • [36] Exploring cosmology applications on distributed environments
    Lan, ZL
    Taylor, VE
    Bryan, G
    FUTURE GENERATION COMPUTER SYSTEMS, 2003, 19 (06) : 839 - 847
  • [37] Fault-tolerant scheduling and data placement for scientific workflow processing in geo-distributed clouds
    Li, Chunlin
    Liu, Jun
    Wang, Min
    Luo, Youlong
    JOURNAL OF SYSTEMS AND SOFTWARE, 2022, 187
  • [38] WorkflowSim: A Toolkit for Simulating Scientific Workflows in Distributed Environments
    Chen, Weiwei
    Deelman, Ewa
    2012 IEEE 8TH INTERNATIONAL CONFERENCE ON E-SCIENCE (E-SCIENCE), 2012,
  • [39] Intentional Data Placement Optimization for Distributed Data Warehouses
    Arres, Billel
    Kabachi, Nadia
    Boussaid, Omar
    Bentayeb, Fadila
    2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 80 - 86
  • [40] Distributed classification for imbalanced big data in distributed environments
    Wang, Huihui
    Xiao, Mingfei
    Wu, Changsheng
    Zhang, Jing
    WIRELESS NETWORKS, 2024, 30 (05) : 3657 - 3668