Boosting Performance of Data-intensive Analysis Workflows with Distributed Coordinated Caching

被引:0
|
作者
Heidecker, C. [1 ]
von Cube, R. F. [1 ]
Giffels, M. [1 ]
Quast, G. [1 ]
Sauter, M. [1 ]
Schnepf, M. J. [1 ]
机构
[1] KIT Karlsruhe Inst Technol, Karlsruhe, Germany
关键词
D O I
10.1088/1742-6596/1525/1/012065
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data-intensive end-user analyses in high energy physics require high data throughput to reach short turnaround cycles. This leads to enormous challenges for storage and network infrastructure, especially when facing the tremendously increasing amount of data to be processed during High-Luminosity LHC runs. Including opportunistic resources with volatile storage systems into the traditional HEP computing facilities makes this situation more complex. Bringing data close to the computing units is a promising approach to solve throughput limitations and improve the overall performance. We focus on coordinated distributed caching by coordinating workflows to the most suitable hosts in terms of cached files. This allows optimizing overall processing efficiency of data-intensive workflows and efficiently use limited cache volume by reducing replication of data on distributed caches. We developed a NaviX coordination service at KIT that realizes coordinated distributed caching using XRootD cache proxy server infrastructure and HTCondor batch system. In this paper, we present the experience gained in operating coordinated distributed caches on cloud and HPC resources. Furthermore, we show benchmarks of a dedicated high throughput cluster, the Throughput-Optimized Analysis-System (TOpAS), which is based on the above-mentioned concept.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Optimizing Distributed Data-Intensive Workflows
    Friese, Ryan D.
    Tallent, Nathan R.
    Schram, Malachi
    Halappanavar, Mahantesh
    Barker, Kevin J.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2018, : 279 - 289
  • [2] Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud
    Heidsieck, Gaetan
    de Oliveira, Daniel
    Pacitti, Esther
    Pradal, Christophe
    Tardieu, Francois
    Valduriez, Patrick
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT II, 2019, 11707 : 452 - 466
  • [3] Improving Parallelism in Data-Intensive Workflows with Distributed Databases
    Watanabe, Elaine Naomi
    Braghetto, Kelly Rosa
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2018), 2018, : 209 - 216
  • [4] Data throttling for data-intensive workflows
    Park, Sang-Min
    Humphrey, Marty
    [J]. 2008 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-8, 2008, : 1796 - 1806
  • [5] XML database support for distributed execution of data-intensive scientific workflows
    Hastings, S
    Ribeiro, M
    Langella, S
    Oster, S
    Catalyurek, U
    Pan, T
    Huang, K
    Ferreira, R
    Saltz, J
    Kurc, T
    [J]. SIGMOD RECORD, 2005, 34 (03) : 50 - 55
  • [6] An enhanced active caching strategy for data-intensive computations in distributed GIS
    Pan, Shaoming
    Chong, Yanwen
    Xu, Zhengquan
    Tan, Xicheng
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (10): : 4324 - 4346
  • [7] An enhanced active caching strategy for data-intensive computations in distributed GIS
    Shaoming Pan
    Yanwen Chong
    Zhengquan Xu
    Xicheng Tan
    [J]. The Journal of Supercomputing, 2017, 73 : 4324 - 4346
  • [8] Understanding performance of distributed data-intensive applications
    Miceli, Christopher
    Miceli, Michael
    Rodriguez-Milla, Bety
    Jha, Shantenu
    [J]. PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2010, 368 (1926): : 4089 - 4102
  • [9] Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds
    Xilong Qu
    Peng Xiao
    Lirong Huang
    [J]. The Journal of Supercomputing, 2018, 74 : 2935 - 2955
  • [10] Improving the energy efficiency and performance of data-intensive workflows in virtualized clouds
    Qu, Xilong
    Xiao, Peng
    Huang, Lirong
    [J]. JOURNAL OF SUPERCOMPUTING, 2018, 74 (07): : 2935 - 2955