On-demand Data Analytics in HPC Environments at Leadership Computing Facilities: Challenges and Experiences

被引:0
|
作者
Harney, John [1 ]
Lim, Seung-Hwan [1 ]
Sukumar, Sreenivas [1 ]
Stansberry, Dale [1 ]
Xenopoulos, Peter [1 ]
机构
[1] Oak Ridge Natl Lab, Natl Ctr Computat Sci, Oak Ridge, TN 37830 USA
关键词
HPC; data analytics; distributed computing;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The construction of data analysis infrastructures that handle continuously accumulating data is quickly becoming an essential requirement for many organizations such as the U.S. Department of Energy (DOE). While DOE supports some of the largest computing facilities in the world, new analysis infrastructures like Apache Spark are difficult to implement. In this paper, we propose an on-demand Spark service that mitigates these difficulties, allowing facility users to flexibly create Spark instances quickly and easily. We define a systematic approach for creating these Spark instances and validate that optimal performance benefits are maintained. Using a series of benchmarks for algorithms that are commonly used in scientific workflows, we compared the behavior of Spark tasks using facility resources with that of an open research cloud that has a dedicated Spark infrastructure deployed. Finally, we leveraged a scientific use case from the Center of Nanophase Materials at the Oak Ridge National Laboratory to demonstrate the utility of using Spark in the computing facility.
引用
收藏
页码:2087 / 2096
页数:10
相关论文
共 50 条
  • [1] On-Demand XML Data Broadcast in Wireless Computing Environments
    Sun, Weiwei
    Qin, Yongrui
    Yu, Ping
    Zhang, Zhuoyao
    [J]. 2007 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-15, 2007, : 3035 - 3038
  • [2] HPC CloudPills: on-demand deployment and execution of HPC application in cloud environments
    Ruiu, Pietro
    Terzo, Olivier
    Falzone, Alberto
    Maggi, Paolo
    Torterolo, Livia
    Usai, Enrico
    Carlino, Giuseppe
    Prandi, Rossella
    Perego, Gianpaolo
    [J]. 2014 NINTH INTERNATIONAL CONFERENCE ON P2P, PARALLEL, GRID, CLOUD AND INTERNET COMPUTING (3PGCIC), 2014, : 82 - 88
  • [3] Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments
    Netti, Alessio
    Ott, Michael
    Guillen, Carla
    Tafani, Daniele
    Schulz, Martin
    [J]. PARALLEL COMPUTING, 2022, 113
  • [4] Developing on-Demand Secure High-Performance Computing Services for Biomedical Data Analytics
    Robison, Nicholas
    Anderson, Nick
    [J]. MEDINFO 2013: PROCEEDINGS OF THE 14TH WORLD CONGRESS ON MEDICAL AND HEALTH INFORMATICS, PTS 1 AND 2, 2013, 192 : 1144 - 1144
  • [5] Architectural Considerations for Highly Scalable Computing to Support On-demand Video Analytics
    Mathew, George
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 1646 - 1649
  • [6] Exploring On-Demand Composition of Pervasive Collaborations in Smart Computing Environments
    Wutzler, Markus
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS, 2017, 10034 : 305 - 314
  • [7] Challenges of Cloud Computing & Big Data Analytics
    Gupta, Anita
    Mehrotra, Abhay
    Khan, P. M.
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2015, : 1112 - 1115
  • [8] Data Analytics for Manufacturing Systems Experiences and Challenges
    Vodencarevic, Asmir
    Fett, Thomas
    [J]. PROCEEDINGS OF 2015 IEEE 20TH CONFERENCE ON EMERGING TECHNOLOGIES & FACTORY AUTOMATION (ETFA), 2015,
  • [9] Computing Linked Data On-Demand Using the VOLT Proxy
    Regalia, Blake
    Janowicz, Krzysztof
    [J]. SEMANTIC WEB, ESWC 2016, 2016, 9989 : 189 - 193
  • [10] On-demand Data Analytics Support for Hemorrhagic Stroke Patients Using Wearable IoT Device and Fog Computing Technology
    Abosede, Samson A.
    Adetunmbi, Adebayo O.
    Sarumi, Oluwafemi A.
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR 2021), 2022, 417 : 404 - 412