Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

被引:17
|
作者
Klimentov, A. [1 ]
Buncic, P. [2 ]
De, K. [3 ,4 ]
Jha, S.
Maeno, T. [1 ]
Mount, R. [5 ]
Nilsson, P. [1 ]
Oleynik, D. [3 ,4 ]
Panitkin, S. [1 ]
Petrosyan, A. [3 ,4 ]
Porter, R. J. [6 ]
Read, K. F. [7 ]
Vaniachine, A. [8 ]
Wells, J. C. [7 ]
Wenaus, T. [1 ]
机构
[1] Brookhaven Natl Lab, Upton, NY 11973 USA
[2] CERN, Geneva, Switzerland
[3] Univ Texas Arlington, Arlington, TX 76019 USA
[4] Rutgers State Univ, Piscataway, NJ USA
[5] SLAC Natl Accelerator Lab, Menlo Pk, CA USA
[6] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[7] Oak Ridge Natl Lab, Oak Ridge, TN USA
[8] Argonne Natl Lab Lemont, Argonne, IL USA
关键词
D O I
10.1088/1742-6596/608/1/012040
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10(2)) sites, O(10(5)) cores, O(10(8)) jobs per year, O(10(3)) users, and ATLAS data volume is O(10(17)) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] NEXT GENERATION WORKLOAD MANAGEMENT AND ANALYSIS SYSTEM FOR BIG DATA
    De, K.
    Klimentov, A.
    Panitkin, S.
    Titov, M.
    Vaniachine, A.
    Wenaus, T.
    Yu, D.
    Zaruba, G.
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1523 - 1523
  • [2] PanDA: Next Generation Workload Management and Analysis System for Big Data
    Klimentov, A.
    Vaniachine, A.
    De, K.
    Wenaus, T.
    Panitkin, S.
    Yu, D.
    Zaruba, G.
    Titov, M.
    2012 SC COMPANION: HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SCC), 2012, : 1521 - +
  • [3] Cost optimization of distributed data centers via computing workload distribution for next generation network systems
    Peng, Yuyang
    Li, Jun
    Hai, Han
    Jiang, Xue-Qin
    Al-Hazemi, Fawaz
    Park, Sangdon
    PHYSICAL COMMUNICATION, 2021, 46
  • [4] Distributed matrix computing system for big data
    Zhang, Guangtao
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2024, 18 (04): : 2915 - 2931
  • [5] SecDATAVIEW: A Secure Big Data Workflow Management System for Heterogeneous Computing Environments
    Mofrad, Saeid
    Ahmed, Ishtiaq
    Lu, Shiyong
    Yang, Ping
    Cui, Heming
    Zhang, Fengwei
    35TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSA), 2019, : 390 - 403
  • [6] Rucio - The next generation of large scale distributed system for ATLAS Data Management
    Garonne, V.
    Vigne, R.
    Stewart, G.
    Barisits, M.
    Beermann, T.
    Lassnig, M.
    Serfon, C.
    Goossens, L.
    Nairz, A.
    20TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2013), PARTS 1-6, 2014, 513
  • [7] A novel workload migration scheme for heterogeneous distributed computing
    Li, YW
    Lan, ZL
    2005 IEEE International Symposium on Cluster Computing and the Grid, Vols 1 and 2, 2005, : 1055 - 1062
  • [8] Heterogeneous network computing: The next generation
    Sunderam, V
    PARALLEL COMPUTING, 1997, 23 (1-2) : 121 - 135
  • [9] Integrating Brain Implants With Local and Distributed Computing Devices: A Next Generation Epilepsy Management System
    Kremen, Vaclav
    Brinkmann, Benjamin H.
    Kim, Inyong
    Guragain, Hari
    Nasseri, Mona
    Magee, Abigail L.
    Attia, Tal Pal
    Nejedly, Petr
    Sladky, Vladimir
    Nelson, Nathanial
    Chang, Su-Youne
    Herron, Jeffrey A.
    Adamski, Tom
    Baldassano, Steven
    Cimbalnik, Jan
    Vasoli, Vince
    Fehrmann, Elizabeth
    Chouinard, Tom
    Patterson, Edward E.
    Litt, Brian
    Stead, Matt
    Van Gompel, Jamie
    Sturges, Beverly K.
    Jo, Hang Joon
    Crowe, Chelsea M.
    Denison, Timothy
    Worrell, Gregory A.
    IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2018, 6
  • [10] Workload Management for Big Data Analytics
    Aboulnaga, Ashraf
    Babu, Shivnath
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 1249 - 1249