Minority Disk Failure Prediction Based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems

被引:28
|
作者
Zhan, Ji [1 ]
Zhou, Ke [1 ]
Huang, Ping [2 ]
He, Xubin [2 ]
Xie, Ming [3 ]
Cheng, Bin [3 ]
Ji, Yongguang [3 ]
Wang, Yinhu [3 ]
机构
[1] Huazhong Univ Sci & Technol, Intelligent Cloud Storage Joint Res Ctr, Wuhan Natl Lab Optoelect, Key Lab Informat Storage Syst, Wuhan 430074, Peoples R China
[2] Temple Univ, Philadelphia, PA 19122 USA
[3] Tencent Inc, Shenzhen 518057, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Data centers; Servers; Data models; Predictive models; Training data; Support vector machines; Reliability; Disk failure; machine learning; transfer learning; cloud computing; data center; NEURAL-NETWORK; CLASSIFICATION;
D O I
10.1109/TPDS.2020.2985346
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The storage system in large scale data centers is typically built upon thousands or even millions of disks, where disk failures constantly happen. A disk failure could lead to serious data loss and thus system unavailability or even catastrophic consequences if the lost data cannot be recovered. While replication and erasure coding techniques have been widely deployed to guarantee storage availability and reliability, disk failure prediction is gaining popularity as it has the potential to prevent disk failures from occurring in the first place. Recent trends have turned toward applying machine learning approaches based on disk SMART attributes for disk failure predictions. However, traditional machine learning (ML) approaches require a large set of training data in order to deliver good predictive performance. In large-scale storage systems, new disks enter gradually to augment the storage capacity or to replace failed disks, leading storage systems to consist of small amounts of new disks from different vendors and/or different models from the same vendor as time goes on. We refer to this relatively small amount of disks as minority disks. Due to the lack of sufficient training data, traditional ML approaches fail to deliver satisfactory predictive performance in evolving storage systems which consist of heterogeneous minority disks. To address this challenge and improve the predictive performance for minority disks in large data centers, we propose a minority disk failure prediction model named TLDFP based on a transfer learning approach. Our evaluation results in two realistic datasets have demonstrated that TLDFP can deliver much more precise results and lower additional maintenance cost, compared to four popular prediction models based on traditional ML algorithms and two state-of-the-art transfer learning methods.
引用
收藏
页码:2155 / 2169
页数:15
相关论文
共 50 条
  • [1] Transfer Learning based Failure Prediction for Minority Disks in Large Data Centers of Heterogeneous Disk Systems
    Zhang, Ji
    Zhou, Ke
    Huang, Ping
    He, Xubin
    Xiao, Zhili
    Cheng, Bin
    Ji, Yongguang
    Wang, Yinhu
    [J]. PROCEEDINGS OF THE 48TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP 2019), 2019,
  • [2] Disk Failure Prediction Based on Transfer Learning
    Gao, Guangfu
    Wu, Peng
    Li, Hui
    Zhang, Tianze
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 628 - 637
  • [3] Disk Failure Prediction in Data Centers via Online Learning
    Xiao, Jiang
    Xiong, Zhuang
    Wu, Song
    Yi, Yusheng
    Jin, Hai
    Hu, Kan
    [J]. PROCEEDINGS OF THE 47TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2018,
  • [4] CSLE: A Cost-sensitive Learning Engine for Disk Failure Prediction in Large Data Centers
    Zhang, Xinyan
    Shan, Kai
    Tan, Zhipeng
    Feng, Dan
    [J]. PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), 2022, : 478 - 483
  • [5] Disk Failure Prediction in Heterogeneous Environments
    Rincon C, Carlos A.
    Paris, Jehan-Francois
    Vilalta, Ricardo
    Cheng, Albert M. K.
    Long, Darrell D. E.
    [J]. 2017 INTERNATIONAL SYMPOSIUM ON PERFORMANCE EVALUATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (SPECTS), 2017,
  • [6] Hard Disk Failure Prediction Based on Blending Ensemble Learning
    Zhang, Mingyu
    Ge, Wenqiang
    Tang, Ruichun
    Liu, Peishun
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [7] Transfer Learning for Bayesian Networks with Application on Hard Disk Drives Failure Prediction
    Pereira, Francisco Lucas F.
    Lima, Fernando Dione S.
    Leite, Lucas G. M.
    Gomes, Joao Paulo P.
    Machado, Javam C.
    [J]. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2017, : 228 - 233
  • [8] Disk Failure Prediction Model for Information Systems based on SMART Technology
    Yang, Yin
    Liang, Wei
    Li, Wenyi
    [J]. PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER ENGINEERING AND ELECTRONICS (ICECEE 2015), 2015, 24 : 14 - 18
  • [9] HDDse: Enabling High-Dimensional Disk State Embedding for Generic Failure Detection System of Heterogeneous Disks in Large Data Centers
    Zhang, Ji
    Huang, Ping
    Zhou, Ke
    Xie, Ming
    Schelter, Sebastian
    [J]. PROCEEDINGS OF THE 2020 USENIX ANNUAL TECHNICAL CONFERENCE, 2020, : 111 - 126
  • [10] A Disk Failure Prediction Method Based on Active Semi-supervised Learning
    Zhou, Yang
    Wang, Fang
    Feng, Dan
    [J]. ACM TRANSACTIONS ON STORAGE, 2022, 18 (04)