Leveraging State-of-the-Art Engines for Large-Scale Data Analysis in High Energy Physics

被引:4
|
作者
Padulano, Vincenzo Eduardo [1 ,2 ]
Kabadzhov, Ivan Donchev [1 ,3 ]
Saavedra, Enric Tejedor [1 ]
Guiraud, Enrico [1 ]
Alonso-Jorda, Pedro [2 ]
机构
[1] CERN, EP SFT, CH-1211 Geneva, Switzerland
[2] Univ Politecn Valencia, Dept Computat Syst & Computat, Valencia 46022, Valencia, Spain
[3] Albert Ludwig Univ Freiburg, Dept Comp Sci, D-79098 Freiburg, Germany
关键词
Root; High energy physics; Distributed computing; Dask; Spark; ROOT; EVOLUTION;
D O I
10.1007/s10723-023-09645-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Large Hadron Collider (LHC) at CERN has generated a vast amount of information from physics events, reaching peaks of TB of data per day which are then sent to large storage facilities. Traditionally, data processing workflows in the High Energy Physics (HEP) field have leveraged grid computing resources. In this context, users have been responsible for manually parallelising the analysis, sending tasks to computing nodes and aggregating the partial results. Analysis environments in this field have had a common building block in the ROOT software framework. This is the de facto standard tool for storing, processing and visualising HEP data. ROOT offers a modern analysis tool called RDataFrame, which can parallelise computations from a single machine to a distributed cluster while hiding most of the scheduling and result aggregation complexity from users. This is currently done by leveraging Apache Spark as the distributed execution engine, but other alternatives are being explored by HEP research groups. Notably, Dask has rapidly gained popularity thanks to its ability to interface with batch queuing systems, widespread in HEP grid computing facilities. Furthermore, future upgrades of the LHC are expected to bring a dramatic increase in data volumes. This paper presents a novel implementation of the Dask backend for the distributed RDataFrame tool in order to address the aforementioned future trends. The scalability of the tool with both the new backend and the already available Spark backend is demonstrated for the first time on more than two thousand cores, testing a real HEP analysis.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Leveraging State-of-the-Art Engines for Large-Scale Data Analysis in High Energy Physics
    Vincenzo Eduardo Padulano
    Ivan Donchev Kabadzhov
    Enric Tejedor Saavedra
    Enrico Guiraud
    Pedro Alonso-Jordá
    [J]. Journal of Grid Computing, 2023, 21
  • [2] Large-Scale Ontology Matching: State-of-the-Art Analysis
    Ochieng, Peter
    Kyanda, Swaib
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [3] State-of-the-art in Large-Scale Volume Visualization Beyond Structured Data
    Sarton, J.
    Zellmann, S.
    Demirci, S.
    Gudukbay, U.
    Alexandre-Barff, W.
    Lucas, L.
    Dischler, J. M.
    Wesner, S.
    Wald, I.
    [J]. COMPUTER GRAPHICS FORUM, 2023, 42 (03) : 491 - 515
  • [4] Accelerating Large-scale Topology Optimization: State-of-the-Art and Challenges
    Mukherjee, Sougata
    Lu, Dongcheng
    Raghavan, Balaji
    Breitkopf, Piotr
    Dutta, Subhrajit
    Xiao, Manyu
    Zhang, Weihong
    [J]. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2021, 28 (07) : 4549 - 4571
  • [5] Accelerating Large-scale Topology Optimization: State-of-the-Art and Challenges
    Sougata Mukherjee
    Dongcheng Lu
    Balaji Raghavan
    Piotr Breitkopf
    Subhrajit Dutta
    Manyu Xiao
    Weihong Zhang
    [J]. Archives of Computational Methods in Engineering, 2021, 28 : 4549 - 4571
  • [6] Thermal Load in Large-Scale Bridges: A State-of-the-Art Review
    Zhou, Guang-Dong
    Yi, Ting-Hua
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2013,
  • [7] Recent progress in the application of energy technologies in Large-Scale building Blocks: A State-of-the-Art review
    Madessa, Habtamu Bayera
    Shakerin, Mohammad
    Reinskau, Espen Helberg
    Rabani, Mehrdad
    [J]. ENERGY CONVERSION AND MANAGEMENT, 2024, 305
  • [8] Revolutionising building inspection techniques to meet large-scale energy demands: A review of the state-of-the-art
    Shariq, M. Hasan
    Hughes, Ben Richard
    [J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2020, 130
  • [9] Energy-saving potential prediction models for large-scale building: A state-of-the-art review
    Yang, Xiu'e
    Liu, Shuli
    Zou, Yuliang
    Ji, Wenjie
    Zhang, Qunli
    Ahmed, Abdullahi
    Han, Xiaojing
    Shen, Yongliang
    Zhang, Shaoliang
    [J]. RENEWABLE & SUSTAINABLE ENERGY REVIEWS, 2022, 156
  • [10] State-of-the-Art in GPU-Based Large-Scale Volume Visualization
    Beyer, Johanna
    Hadwiger, Markus
    Pfister, Hanspeter
    [J]. COMPUTER GRAPHICS FORUM, 2015, 34 (08) : 13 - 37