Next Generation Workload Management System For Big Data on Heterogeneous Distributed Computing

被引:17
|
作者
Klimentov, A. [1 ]
Buncic, P. [2 ]
De, K. [3 ,4 ]
Jha, S.
Maeno, T. [1 ]
Mount, R. [5 ]
Nilsson, P. [1 ]
Oleynik, D. [3 ,4 ]
Panitkin, S. [1 ]
Petrosyan, A. [3 ,4 ]
Porter, R. J. [6 ]
Read, K. F. [7 ]
Vaniachine, A. [8 ]
Wells, J. C. [7 ]
Wenaus, T. [1 ]
机构
[1] Brookhaven Natl Lab, Upton, NY 11973 USA
[2] CERN, Geneva, Switzerland
[3] Univ Texas Arlington, Arlington, TX 76019 USA
[4] Rutgers State Univ, Piscataway, NJ USA
[5] SLAC Natl Accelerator Lab, Menlo Pk, CA USA
[6] Univ Calif Berkeley, Lawrence Berkeley Natl Lab, Berkeley, CA 94720 USA
[7] Oak Ridge Natl Lab, Oak Ridge, TN USA
[8] Argonne Natl Lab Lemont, Argonne, IL USA
关键词
D O I
10.1088/1742-6596/608/1/012040
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Large Hadron Collider (LHC), operating at the international CERN Laboratory in Geneva, Switzerland, is leading Big Data driven scientific explorations. Experiments at the LHC explore the fundamental nature of matter and the basic forces that shape our universe, and were recently credited for the discovery of a Higgs boson. ATLAS and ALICE are the largest collaborations ever assembled in the sciences and are at the forefront of research at the LHC. To address an unprecedented multi-petabyte data processing challenge, both experiments rely on a heterogeneous distributed computational infrastructure. The ATLAS experiment uses PanDA (Production and Data Analysis) Workload Management System (WMS) for managing the workflow for all data processing on hundreds of data centers. Through PanDA, ATLAS physicists see a single computing facility that enables rapid scientific breakthroughs for the experiment, even though the data centers are physically scattered all over the world. The scale is demonstrated by the following numbers: PanDA manages O(10(2)) sites, O(10(5)) cores, O(10(8)) jobs per year, O(10(3)) users, and ATLAS data volume is O(10(17)) bytes. In 2013 we started an ambitious program to expand PanDA to all available computing resources, including opportunistic use of commercial and academic clouds and Leadership Computing Facilities (LCF). The project titled 'Next Generation Workload Management and Analysis System for Big Data' (BigPanDA) is funded by DOE ASCR and HEP. Extending PanDA to clouds and LCF presents new challenges in managing heterogeneity and supporting workflow. The BigPanDA project is underway to setup and tailor PanDA at the Oak Ridge Leadership Computing Facility (OLCF) and at the National Research Center "Kurchatov Institute" together with ALICE distributed computing and ORNL computing professionals. Our approach to integration of HPC platforms at the OLCF and elsewhere is to reuse, as much as possible, existing components of the PanDA system. We will present our current accomplishments with running the PanDA WMS at OLCF and other supercomputers and demonstrate our ability to use PanDA as a portal independent of the computing facilities infrastructure for High Energy and Nuclear Physics as well as other data-intensive science applications.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] A Distributed Collaborative Urban Traffic Big Data System Based on Cloud Computing
    Zhang, Jianqin
    Chen, Zhihong
    Xu, Zhijie
    Du, Mingyi
    Yang, Weijun
    Guo, Liang
    IEEE INTELLIGENT TRANSPORTATION SYSTEMS MAGAZINE, 2019, 11 (04) : 37 - 47
  • [32] A Study on Workload Imbalance Issues in Data Intensive Distributed Computing
    Groot, Sven
    Coda, Kazuo
    Kitsuregawa, Masaru
    DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2010, 5999 : 27 - 32
  • [33] A Distributed System for Fast Alignment of Next-Generation Sequencing Data
    Srimani, Jaydeep K.
    Wu, Po-Yen
    Phan, John H.
    Wang, May D.
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 579 - 584
  • [34] BIG DATA Next-Generation Machines for Big Science
    Hack, James J.
    Papka, Michael E.
    COMPUTING IN SCIENCE & ENGINEERING, 2015, 17 (04) : 63 - 65
  • [35] Next Generation Mapping: Combining Deep Learning, Cloud Computing, and Big Remote Sensing Data
    Parente, Leandro
    Taquary, Evandro
    Silva, Ana Paula
    Souza, Carlos, Jr.
    Ferreira, Laerte
    REMOTE SENSING, 2019, 11 (23)
  • [36] Dynamic Reconfigurable Integrated Management and Monitoring System for Heterogeneous Distributed Computing Systems
    Min, Bup-Ki
    Kim, Hyeon Soo
    Kuk, Seunghak
    Park, Sung Woon
    Kim, Chumsu
    2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 37 - 42
  • [37] LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data
    Tang, Mingjie
    Yu, Yongyang
    Malluhi, Qutaibah M.
    Ouzzani, Mourad
    Aref, Walid G.
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (13): : 1565 - 1568
  • [38] Massive data (big data): the next "big thing" in information management
    Alonso Arevalo, Julio
    Vazquez Vazquez, Marta
    BID-TEXTOS UNIVERSITARIS DE BIBLIOTECONOMIA I DOCUMENTACIO, 2016, (36):
  • [39] Predictive Resource Management for Next-Generation High-Performance Computing Heterogeneous Platforms
    Massari, Giuseppe
    Pupykina, Anna
    Agosta, Giovanni
    Fornaciari, William
    EMBEDDED COMPUTER SYSTEMS: ARCHITECTURES, MODELING, AND SIMULATION, SAMOS 2019, 2019, 11733 : 470 - 483
  • [40] Machine Learning Based Distributed Big Data Analysis Framework for Next Generation Web in IoT
    Singh, Sushil Kumar
    Cha, Jeonghun
    Kim, Tae Woo
    Park, Jong Hyuk
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (02) : 597 - 618