Disk Failure Prediction in Data Centers via Online Learning

被引:58
|
作者
Xiao, Jiang [1 ]
Xiong, Zhuang [1 ]
Wu, Song [1 ]
Yi, Yusheng [1 ]
Jin, Hai [1 ]
Hu, Kan [1 ]
机构
[1] Huazhong Univ Sci & Technol, SCTS CGCL, Wuhan, Peoples R China
基金
美国国家科学基金会;
关键词
Failure prediction; online learning; hard disk drive; SMART; storage system reliability;
D O I
10.1145/3225058.3225106
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Disk failure has become a major concern with the rapid expansion of storage systems in data centers. Based on SMART (Self-Monitoring, Analysis and Reporting Technology) attributes, many researchers derive disk failure prediction models using machine learning techniques. Despite the significant developments, the majority of works rely on offline training and thereby hinder their adaption to the continuous update of forthcoming data, suffering from the 'model aging' problem. We are therefore motivated to uncover the root cause - the dynamic SMART distribution for 'model aging', aiming to resolve the performance degradation as to pave a comprehensive study in practice. In this paper, we introduce a novel disk failure prediction model using Online Random Forests (ORFs). Our ORF-based model can automatically evolve with sequential arrival of data on-the-fly and thus is highly adaptive to the variance of SMART distribution over time. Moreover, it has favourable advantage against the offline counterparts in terms of superior prediction performance. Experiments on real-world datasets show that our ORF model converges rapidly to the offline random forests and achieves stable failure detection rates of 93-99% with low false alarm rates. Furthermore, we demonstrate the ability of our approach on maintaining stable prediction performance for the long-term usage in data centers.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Minority Disk Failure Prediction Based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems
    Zhan, Ji
    Zhou, Ke
    Huang, Ping
    He, Xubin
    Xie, Ming
    Cheng, Bin
    Ji, Yongguang
    Wang, Yinhu
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (09) : 2155 - 2169
  • [2] CSLE: A Cost-sensitive Learning Engine for Disk Failure Prediction in Large Data Centers
    Zhang, Xinyan
    Shan, Kai
    Tan, Zhipeng
    Feng, Dan
    [J]. PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 478 - 483
  • [3] Transfer Learning based Failure Prediction for Minority Disks in Large Data Centers of Heterogeneous Disk Systems
    Zhang, Ji
    Zhou, Ke
    Huang, Ping
    He, Xubin
    Xiao, Zhili
    Cheng, Bin
    Ji, Yongguang
    Wang, Yinhu
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [4] Task Failure Prediction in Cloud Data Centers Using Deep Learning
    Gao, Jiechao
    Wang, Haoyu
    Shen, Haiying
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1111 - 1116
  • [5] Task Failure Prediction in Cloud Data Centers Using Deep Learning
    Gao, Jiechao
    Wang, Haoyu
    Shen, Haiying
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1411 - 1422
  • [6] Workload Failure Prediction for Data Centers
    Li, Jie
    Wang, Rui
    Ali, Ghazanfar
    Dang, Tommy
    Sill, Alan
    Chen, Yong
    [J]. 2023 IEEE 16TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, CLOUD, 2023, : 479 - 485
  • [7] Disk Failure Prediction Based on Transfer Learning
    Gao, Guangfu
    Wu, Peng
    Li, Hui
    Zhang, Tianze
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 628 - 637
  • [8] Memory Failure Prediction Using Online Learning
    Du, Xiaoming
    Li, Cong
    [J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS (MEMSYS 2018), 2018, : 38 - 49
  • [9] Toward Adaptive Disk Failure Prediction via Stream Mining
    Han, Shujie
    Lee, Patrick P. C.
    Shen, Zhirong
    He, Cheng
    Liu, Yi
    Huang, Tao
    [J]. 2020 IEEE 40TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2020, : 628 - 638
  • [10] Hard Disk Failure Prediction Based on Blending Ensemble Learning
    Zhang, Mingyu
    Ge, Wenqiang
    Tang, Ruichun
    Liu, Peishun
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (05):