Facilitating the HPC Data Center Host efficiency through Big Data Analytics

被引:0
|
作者
Rager, Jack [1 ]
Liu, Fang Cherry [2 ]
机构
[1] Georgia Inst Technol, Sch Ind & Syst Engn, Atlanta, GA 30332 USA
[2] Georgia Inst Technol, Partnership Adv Comp Environm PACE, Atlanta, GA 30332 USA
关键词
High Performance Computing; Host Analysis; Unsupervised Machine Learning; Data Center;
D O I
10.1109/BigData50022.2020.9378487
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Quality of service is important feature for a High Performance Computing Center (HPC) center like Partnership for an Advanced Computing Environment (PACE) center in Georgia Institute of Technology (Georgia Tech). The user's job fails running on a HPC center may due to a spectral of reasons, one of major contributor is the hardware and network failure. Reducing the hardware failure rate can significantly increase a data center's quality of service as well as reducing the cost of human intervention. This is critical during PACE's transition to a fee-based service model in which uptime correlates directly with revenue. PACE has around 9 millions jobs each year with 12% of job failure rate. In order to extend service life of hardware and reduce the potential failure and data center's cost, we present a machine learning method to understand the center's host usage pattern. By clustering the hosts based on multiple features, we reshuffle the host list to avoid the hosts being overused over time. We build a test framework which runs the complex combination of experiments, and presents the ad hoc comparisons. We intend to make the machine learning method in a rack aware fashion, and show the meaningful result with rack information included.
引用
收藏
页码:3280 / 3287
页数:8
相关论文
共 50 条
  • [31] Big data: Evaluation criteria for big data analytics technologies
    Muchemwa, Regis
    de la Harpe, Andre
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BUSINESS AND MANAGEMENT DYNAMICS 2016: SUSTAINABLE ECONOMIES IN THE INFORMATION ECONOMY, 2016, : 80 - 86
  • [32] A Conceptual Framework for HPC Operational Data Analytics
    Netti, Alessio
    Shin, Woong
    Ott, Michael
    Wilde, Torsten
    Bates, Natalie
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2021), 2021, : 596 - 603
  • [33] Making the Most of Big Data and Data Analytics
    Turner, Shawn M.
    [J]. ITE JOURNAL-INSTITUTE OF TRANSPORTATION ENGINEERS, 2021, 91 (02): : 24 - 26
  • [34] Data stream classification and big data analytics
    Krawczyk, Bartosz
    Wozniak, Michal
    Stefanowski, Jerzy
    [J]. NEUROCOMPUTING, 2015, 150 : 238 - 239
  • [35] Big Data Analytics, Data Science and the CIS
    Yao, Xin
    [J]. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2015, 10 (01) : 4 - 5
  • [36] Big Data Infrastructure for Aviation Data Analytics
    Murugan, Anandavel
    Mylaraswamy, Dinkar
    Xu, Brian
    Dietrich, Paul
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING IN EMERGING MARKETS (CCEM), 2014, : 87 - 92
  • [37] Process Data Analytics in the Era of Big Data
    Qin, S. Joe
    [J]. AICHE JOURNAL, 2014, 60 (09) : 3092 - 3100
  • [38] Big data analytics: transforming data to action
    Bumblauskas, Daniel
    Nold, Herb
    Bumblauskas, Paul
    Igou, Amy
    [J]. BUSINESS PROCESS MANAGEMENT JOURNAL, 2017, 23 (03) : 703 - 720
  • [39] AGRICULTURAL DATA ANALYTICS - SMALL TO BIG DATA
    Ravichandran, S.
    Kareemulla, K.
    [J]. INTERNATIONAL JOURNAL OF AGRICULTURAL AND STATISTICAL SCIENCES, 2018, 14 (01): : 211 - 214
  • [40] Software readiness for data analytics and Big Data
    Cox, Travis
    [J]. Control Engineering, 2020, 67 (03) : 20 - 21