Facilitating the HPC Data Center Host efficiency through Big Data Analytics

被引:0
|
作者
Rager, Jack [1 ]
Liu, Fang Cherry [2 ]
机构
[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Partnership Adv Comp Environm PACE, Atlanta, GA 30332 USA
关键词
High Performance Computing; Host Analysis; Unsupervised Machine Learning; Data Center;
D O I
10.1109/BigData50022.2020.9378487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quality of service is important feature for a High Performance Computing Center (HPC) center like Partnership for an Advanced Computing Environment (PACE) center in Georgia Institute of Technology (Georgia Tech). The user's job fails running on a HPC center may due to a spectral of reasons, one of major contributor is the hardware and network failure. Reducing the hardware failure rate can significantly increase a data center's quality of service as well as reducing the cost of human intervention. This is critical during PACE's transition to a fee-based service model in which uptime correlates directly with revenue. PACE has around 9 millions jobs each year with 12% of job failure rate. In order to extend service life of hardware and reduce the potential failure and data center's cost, we present a machine learning method to understand the center's host usage pattern. By clustering the hosts based on multiple features, we reshuffle the host list to avoid the hosts being overused over time. We build a test framework which runs the complex combination of experiments, and presents the ad hoc comparisons. We intend to make the machine learning method in a rack aware fashion, and show the meaningful result with rack information included.
引用
收藏
页码:3280 / 3287
页数:8
相关论文
共 50 条
  • [1] Towards Sustainability and Energy Efficiency Using Data Analytics for HPC Data Center
    Chinnici, Andrea
    Ahmadzada, Eyvaz
    Kor, Ah-Lian
    De Chiara, Davide
    Dominguez-Diaz, Adrian
    de Marcos Ortega, Luis
    Chinnici, Marta
    [J]. ELECTRONICS, 2024, 13 (17)
  • [2] Big Data and HPC collocation: Using HPC idle resources for Big Data Analytics
    Mercier, Michael
    Glesser, David
    Georgiou, Yiannis
    Richard, Olivier
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 347 - 352
  • [3] Big Data Analytics on HPC Architectures: Performance and Cost
    Xenopoulos, Peter
    Daniel, Jamison
    Matheson, Michael
    Sukumar, Sreenivas
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 2295 - 2304
  • [4] Editorial: Big scientific data analytics on HPC and cloud
    Wang, Jianwu
    Yin, Junqi
    Nguyen, Mai H.
    Wang, Jingbo
    Xu, Weijia
    [J]. FRONTIERS IN BIG DATA, 2024, 7
  • [5] THE DYDAS - "DYNAMIC DATA ANALYTICS SERVICES" PLATFORM FOR HPC BIG DATA ANALYTICS OF EARTH OBSERVATION AND GEOSPATIAL DATA
    Picchiani, M.
    Maranesi, M.
    Mastrucci, M.
    Coltea, I. G.
    Pompei, G.
    Di Giacomo, L.
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 4011 - 4014
  • [6] Visualization and descriptive analytics of wellness data through Big Data
    Hussain, Shujaat
    Lee, Sungyoung
    [J]. 2015 TENTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2015, : 164 - 166
  • [7] Facilitating the Exploration of Open Health-Care Data Through BOAT: A Big Data Open Source Analytics Tool
    Rao, A. Ravishankar
    Clarke, Daniel
    [J]. EMERGING CHALLENGES IN BUSINESS, OPTIMIZATION, TECHNOLOGY, AND INDUSTRY, 2018, : 93 - 115
  • [8] Big Data Exploration through Visual Analytics
    Abousalh-Neto, Nascif A.
    Kazgan, Sumeyye
    [J]. 2012 IEEE CONFERENCE ON VISUAL ANALYTICS SCIENCE AND TECHNOLOGY (VAST), 2012, : 285 - 286
  • [9] Big Data on Clouds and HPC
    Fox, Geoffrey
    [J]. 2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), 2016, : XIX - XIX
  • [10] FPGA Accelerated HPC and Data Analytics
    Strickland, Mike
    [J]. 2018 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT 2018), 2018, : 1 - 1