An Empirical Investigation of Missing Data Handling in Cloud Node Failure Prediction

被引:11
|
作者
Ma, Minghua [1 ]
Liu, Yudong [1 ]
Tong, Yuang [1 ]
Li, Haozhe [1 ]
Zhao, Pu [1 ]
Xu, Yong [1 ]
Zhang, Hongyu [2 ]
He, Shilin [1 ]
Wang, Lu [1 ]
Dang, Yingnong [3 ]
Rajmohan, Saravanakumar [4 ]
Lin, Qingwei [1 ]
机构
[1] Microsoft Res, Beijing, Peoples R China
[2] Univ Newcastle, Callaghan, NSW, Australia
[3] Microsoft Azure, Redmond, WA USA
[4] Microsoft 365, Redmond, WA USA
基金
澳大利亚研究理事会;
关键词
Node failure prediction; Missing data; Cloud systems; INTERPOLATION;
D O I
10.1145/3540250.3558946
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cloud computing systems have become increasingly popular in recent years. A typical cloud system utilizes millions of computing nodes as the basic infrastructure. Node failure has been identified as one of the most prevalent causes of cloud system downtime. To improve the reliability of cloud systems, many previous studies collected monitoring metrics from nodes and built models to predict node failures before the failures happen. However, based on our experience with large-scale real-world cloud systems in Microsoft, we find that the task of predicting node failure is severely hampered by missing data. There is a large amount of missing data, and the online latest data utilized for prediction is even worse. As a result, the real-time performance of the node prediction model is limited. In this paper, we first characterize the missing data problem for node failure prediction. Then, we evaluate several existing data interpolation approaches, and find that node dimension interpolation approaches outperform time dimension ones and deep learning based interpolation is the best for early prediction. Our findings can help academics and engineers address the missing data problem in cloud node failure prediction and other data-driven software engineering scenarios.
引用
收藏
页码:1453 / 1464
页数:12
相关论文
共 50 条
  • [41] Handling missing data in clinical trials: An overview
    Myers, WR
    [J]. DRUG INFORMATION JOURNAL, 2000, 34 (02): : 525 - 533
  • [42] The Handling of Missing Data in Molecular Epidemiology Studies
    Desai, Manisha
    Kubo, Jessica
    Esserman, Denise
    Terry, Mary Beth
    [J]. CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2011, 20 (08) : 1571 - 1579
  • [43] Strategies for handling missing data in randomised trials
    Ian R White
    [J]. Trials, 12 (Suppl 1)
  • [44] The handling of missing binary data in language research
    Pichette, Francois
    Beland, Sebastien
    Jolani, Shahab
    Lesniewska, Justyna
    [J]. STUDIES IN SECOND LANGUAGE LEARNING AND TEACHING, 2015, 5 (01) : 153 - 169
  • [45] Methods for Handling Missing Secondary Respondent Data
    Young, Rebekah
    Johnson, David
    [J]. JOURNAL OF MARRIAGE AND FAMILY, 2013, 75 (01) : 221 - 234
  • [46] Comparison of Methods for Handling Missing Covariate Data
    Åsa M. Johansson
    Mats O. Karlsson
    [J]. The AAPS Journal, 2013, 15 : 1232 - 1241
  • [47] Missing Data Handling in Chronic Pain Trials
    Kim, Yongman
    [J]. JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2011, 21 (02) : 311 - 325
  • [48] Handling Missing Data Problems with Sampling Methods
    Houari, Rima
    Bounceur, Ahcene
    Tari, A-Kamel
    Kechadi, M-Tahar
    [J]. 2014 INTERNATIONAL CONFERENCE ON ADVANCED NETWORKING DISTRIBUTED SYSTEMS AND APPLICATIONS (INDS 2014), 2014, : 99 - 104
  • [49] Handling Missing Data in Clinical Trials: An Overview
    William R. Myers
    [J]. Drug information journal : DIJ / Drug Information Association, 2000, 34 (2): : 525 - 533
  • [50] A comparison of imputation techniques for handling missing data
    Musil, CM
    Warner, CB
    Yobas, PK
    Jones, SL
    [J]. WESTERN JOURNAL OF NURSING RESEARCH, 2002, 24 (07) : 815 - 829