Data placement for scientific applications in distributed environments

被引:0
|
作者
Chervenak, Ann [1 ]
Deelman, Ewa [1 ]
Livny, Miron [2 ]
Su, Mei-Hui [1 ]
Schuler, Rob [1 ]
Bharathi, Shishir [1 ]
Mehta, Gaurang [1 ]
Vahi, Karan [1 ]
机构
[1] Univ So Calif, Inst Informat Sci, Marina Del Rey, CA 90292 USA
[2] Univ Wisconsin, Dept Comp Sci, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific applications often perform complex computational analyses that consume and produce large data sets. We are concerned with data placement policies that distribute data in ways that are advantageous for application execution, for example, by placing data sets so that they may be staged into or out of computations efficiently or by replicating them for improved performance and reliability. In particular, we propose to study the relationship between data placement services and workflow management systems. In this paper, we explore the interactions between two services used in large-scale science today. We evaluate the benefits of prestaging data using the Data Replication Service versus using the native data stage-in mechanisms of the Pegasus workflow management system. We use the astronomy application, Montage, for our experiments and modify it to study the effect of input data size on the benefits of data prestaging. As the size of input data sets increases, prestaging using a data placement service can significantly improve the performance of the overall analysis.
引用
收藏
页码:146 / +
页数:2
相关论文
共 50 条
  • [41] Optimal Data Placement for Scientific Workflows in Cloud
    Shrivastava, Manish
    JOURNAL OF COMPUTER INFORMATION SYSTEMS, 2024, 64 (04) : 501 - 517
  • [42] A data placement strategy in scientific cloud workflows
    Yuan, Dong
    Yang, Yun
    Liu, Xiao
    Chen, Jinjun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2010, 26 (08): : 1200 - 1214
  • [43] Automating the Integration of Services for the Web Processing of Environmental Monitoring Data with Distributed Scientific Applications
    I. V. Bychkov
    A. G. Feoktistov
    S. A. Gorsky
    R. O. Kostromin
    R. K. Fedorov
    Optoelectronics, Instrumentation and Data Processing, 2022, 58 : 373 - 380
  • [44] Automating the Integration of Services for the Web Processing of Environmental Monitoring Data with Distributed Scientific Applications
    Bychkov, I. V.
    Feoktistov, A. G.
    Gorsky, S. A.
    Kostromin, R. O.
    Fedorov, R. K.
    OPTOELECTRONICS INSTRUMENTATION AND DATA PROCESSING, 2022, 58 (04) : 373 - 380
  • [45] Placement of file replicas in data grid environments
    Abawajy, JH
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 3, PROCEEDINGS, 2004, 3038 : 66 - 73
  • [46] An Adaptive Data Placement Architecture in Multicloud Environments
    Wang, Pengwei
    Zhao, Caihui
    Wei, Yi
    Wang, Dong
    Zhang, Zhaohui
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [47] Robust Data Placement in Urgent Computing Environments
    Cope, Jason M.
    Trebon, Nick
    Tufo, Henry M.
    Beckman, Pete
    2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 1472 - +
  • [48] FIVO/QSTORMAN SEMANTIC TOOLKIT FOR SUPPORTING DATA-INTENSIVE APPLICATIONS IN DISTRIBUTED ENVIRONMENTS
    Slota, Renata
    Nikolow, Darin
    Kitowski, Jacek
    Krol, Dariusz
    Kryza, Bartosz
    COMPUTING AND INFORMATICS, 2012, 31 (05) : 1003 - 1024
  • [49] Scalable Energy-Efficient Distributed Data Analytics for Crowdsensing Applications in Mobile Environments
    Jayaraman, Prem Prakash
    Gomes, Joao Bartolo
    Nguyen, Hai-Long
    Abdallah, Zahraa Said
    Krishnaswamy, Shonali
    Zaslaysky, Arkady
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2015, 2 (03): : 109 - 123
  • [50] Configurable distributed retrieval of scientific data
    Silva, DM
    Schwan, K
    Eisenhauer, G
    FOURTH INTERNATIONAL CONFERENCE ON CONFIGURABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 1998, : 120 - 127