Minority Disk Failure Prediction Based on Transfer Learning in Large Data Centers of Heterogeneous Disk Systems

被引:28
|
作者
Zhan, Ji [1 ]
Zhou, Ke [1 ]
Huang, Ping [2 ]
He, Xubin [2 ]
Xie, Ming [3 ]
Cheng, Bin [3 ]
Ji, Yongguang [3 ]
Wang, Yinhu [3 ]
机构
[1] Huazhong Univ Sci & Technol, Intelligent Cloud Storage Joint Res Ctr, Wuhan Natl Lab Optoelect, Key Lab Informat Storage Syst, Wuhan 430074, Peoples R China
[2] Temple Univ, Philadelphia, PA 19122 USA
[3] Tencent Inc, Shenzhen 518057, Peoples R China
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
Data centers; Servers; Data models; Predictive models; Training data; Support vector machines; Reliability; Disk failure; machine learning; transfer learning; cloud computing; data center; NEURAL-NETWORK; CLASSIFICATION;
D O I
10.1109/TPDS.2020.2985346
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The storage system in large scale data centers is typically built upon thousands or even millions of disks, where disk failures constantly happen. A disk failure could lead to serious data loss and thus system unavailability or even catastrophic consequences if the lost data cannot be recovered. While replication and erasure coding techniques have been widely deployed to guarantee storage availability and reliability, disk failure prediction is gaining popularity as it has the potential to prevent disk failures from occurring in the first place. Recent trends have turned toward applying machine learning approaches based on disk SMART attributes for disk failure predictions. However, traditional machine learning (ML) approaches require a large set of training data in order to deliver good predictive performance. In large-scale storage systems, new disks enter gradually to augment the storage capacity or to replace failed disks, leading storage systems to consist of small amounts of new disks from different vendors and/or different models from the same vendor as time goes on. We refer to this relatively small amount of disks as minority disks. Due to the lack of sufficient training data, traditional ML approaches fail to deliver satisfactory predictive performance in evolving storage systems which consist of heterogeneous minority disks. To address this challenge and improve the predictive performance for minority disks in large data centers, we propose a minority disk failure prediction model named TLDFP based on a transfer learning approach. Our evaluation results in two realistic datasets have demonstrated that TLDFP can deliver much more precise results and lower additional maintenance cost, compared to four popular prediction models based on traditional ML algorithms and two state-of-the-art transfer learning methods.
引用
收藏
页码:2155 / 2169
页数:15
相关论文
共 50 条
  • [31] MULTIBEAM OPTICAL DISK DRIVE FOR HIGH DATA TRANSFER RATE SYSTEMS
    KATAYAMA, R
    YOSHIHARA, K
    YAMANAKA, Y
    TSUNEKANE, M
    KAYANUMA, K
    IWANAGA, T
    OKADA, O
    ONO, Y
    [J]. JAPANESE JOURNAL OF APPLIED PHYSICS PART 1-REGULAR PAPERS SHORT NOTES & REVIEW PAPERS, 1992, 31 (2B): : 630 - 634
  • [32] DRAM Failure Prediction in Large-Scale Data Centers
    Yu, Fengyuan
    Xu, Hongzuo
    Jian, Songlei
    Huang, Chenlin
    Wang, Yijie
    Wu, Zhiyue
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2021) / 2021 9TH IEEE INTERNATIONAL CONFERENCE ON MOBILE CLOUD COMPUTING, SERVICES, AND ENGINEERING (MOBILECLOUD 2021), 2021, : 1 - 8
  • [33] An Integrated GAN-Based Approach to Imbalanced Disk Failure Data
    Yuan, Shuangshuang
    Wu, Peng
    Chen, Yuehui
    Zhang, Liqiang
    Wang, Jian
    [J]. INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 615 - 627
  • [34] Multi-Instance Deep Learning Based on Attention Mechanism for Failure Prediction of Unlabeled Hard Disk Drives
    Wang, Guochao
    Wang, Yu
    Sun, Xiaojie
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [35] Multi-Instance Deep Learning Based on Attention Mechanism for Failure Prediction of Unlabeled Hard Disk Drives
    Wang, Guochao
    Wang, Yu
    Sun, Xiaojie
    [J]. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [36] An event-based model for continuous media data on heterogeneous disk servers
    K. Selçuk Candan
    E. Hwang
    V.S. Subrahmanian
    [J]. Multimedia Systems, 1998, 6 : 251 - 270
  • [37] An event-based model for continuous media data on heterogeneous disk servers
    Candan, KS
    Hwang, E
    Subrahmanian, VS
    [J]. MULTIMEDIA SYSTEMS, 1998, 6 (04) : 251 - 270
  • [38] Lifelong Disk Failure Prediction via GAN-based Anomaly Detection
    Jiang, Tianming
    Zeng, Jiangfeng
    Zhou, Ke
    Huang, Ping
    Yang, Tianming
    [J]. 2019 IEEE 37TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2019), 2019, : 199 - 207
  • [39] BaNHFaP: A Bayesian Network based Failure Prediction Approach for Hard Disk Drives
    Chaves, Iago C.
    de Paula, Manoel Rui P.
    Leite, Lucas G. M.
    Queiroz, Lucas P.
    Gomes, Joao Paulo P.
    Machado, Javam C.
    [J]. PROCEEDINGS OF 2016 5TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2016), 2016, : 427 - 432
  • [40] Multi-task Hierarchical Classification for Disk Failure Prediction in Online Service Systems
    Liu, Yudong
    Yang, Hailan
    Zhao, Pu
    Ma, Minghua
    Wen, Chengwu
    Zhang, Hongyu
    Luo, Chuan
    Lin, Qingwei
    Yi, Chang
    Wang, Jiaojian
    Zhang, Chenjian
    Wang, Paul
    Dang, Yingnong
    Rajmohan, Saravan
    Zhang, Dongmei
    [J]. PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 3438 - 3446