Data placement for scientific applications in distributed environments

被引:0
|
作者
Chervenak, Ann [1 ]
Deelman, Ewa [1 ]
Livny, Miron [2 ]
Su, Mei-Hui [1 ]
Schuler, Rob [1 ]
Bharathi, Shishir [1 ]
Mehta, Gaurang [1 ]
Vahi, Karan [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.
引用
收藏
页码:146 / +
页数:2
相关论文
共 50 条
  • [21] Double Auction-based Scheduling of Scientific Applications in Distributed Grid and Cloud Environments
    Radu Prodan
    Marek Wieczorek
    Hamid Mohammadi Fard
    Journal of Grid Computing, 2011, 9 : 531 - 548
  • [22] Double Auction-based Scheduling of Scientific Applications in Distributed Grid and Cloud Environments
    Prodan, Radu
    Wieczorek, Marek
    Fard, Hamid Mohammadi
    JOURNAL OF GRID COMPUTING, 2011, 9 (04) : 531 - 548
  • [23] A Genetic Algorithm Based Data Replica Placement Strategy for Scientific Applications in Clouds
    Cui, Lizhen
    Zhang, Junhua
    Yue, Lingxi
    Shi, Yuliang
    Li, Hui
    Yuan, Dong
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2018, 11 (04) : 727 - 739
  • [24] Provisioning, Placement and Pipelining Strategies for Data-Intensive Applications in Cloud Environments
    Ghoshal, Devarshi
    Ramakrishnan, Lavanya
    2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING (IC2E), 2014, : 325 - 330
  • [25] Applications of distributed computing environments
    Baker, M
    CONCURRENCY-PRACTICE AND EXPERIENCE, 1999, 11 (04): : 167 - 168
  • [26] QoS-Aware Data Placement for MapReduce Applications in Geo-Distributed Data Centers
    Chen, Wuhui
    Liu, Baichuan
    Paik, Incheon
    Li, Zhenni
    Zheng, Zibin
    IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2021, 68 (01) : 120 - 136
  • [27] Performance Analytics for Scientific Distributed Computing Environments
    Datskova, Olga
    Grigoras, Costin
    Shi, Weidong
    INTERNATIONAL CONFERENCE ON BIG DATA AND INTERNET OF THINGS (BDIOT 2017), 2017, : 75 - 79
  • [28] Data placement in intermittently available environments
    Huang, Y
    Venkatasubramanian, N
    HIGH PERFORMANCE COMPUTING - HIPC 2002, PROCEEDINGS, 2002, 2552 : 367 - 376
  • [29] Fault Tolerant Controller Placement in Distributed SDN Environments
    Alshamrani, Adel
    Guha, Sayantan
    Pisharody, Sandeep
    Chowdhary, Ankur
    Huang, Dijiang
    2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2018,
  • [30] Distributed Placement of Power Generation Resources in Uncertain Environments
    Gupta, Gaurav
    Bogdan, Paul
    2017 ACM/IEEE 8TH INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS (ICCPS), 2017, : 71 - 79