SSD Drive Failure Prediction on Alibaba Data Center Using Machine Learning

被引:1
|
作者
Chen, Lei [1 ]
Zhu, Zongpeng [2 ]
Li, Anyu [2 ]
Mashhadi, Najmeh [1 ]
Frickey, Robert [1 ]
Ye, Jinhe [1 ]
Guo, Xin [1 ]
机构
[1] Solidigm, Data Ctr Div, San Jose, CA 95134 USA
[2] Alibaba Grp, Alibaba Cloud, Hangzhou, Peoples R China
来源
2022 14TH IEEE INTERNATIONAL MEMORY WORKSHOP (IMW 2022) | 2022年
关键词
SSD drive failure detection; SSD SMART Data; Ensemble Learning; Light GBM and Random Forest; RELIABILITY; MODEL;
D O I
10.1109/IMW52921.2022.9779284
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Flash-based Solid-State Drives (SSDs) have become a critical storage tier in data centers and enterprise storage systems. Cloud companies are very interested in predicting drive failures. Drive failure prediction enables managing drive replacement and backup data beforehand and helps planning drive purchase strategies. Solidigm and Alibaba collaborate to collect and analyze Self-Monitoring, Analysis, and Reporting Technology (SMART) data and predict SSD failures 30 days ahead of time using machine learning techniques. In this paper, we use group k-fold cross-validation to select the best parameters for machine learning models and avoid overfitting. After obtaining the prediction score of each sample from the model, a post-processing with neural network is applied on those prediction scores to get the drive-level prediction. A modified ensemble learning method is designed and implemented by majority voting on different models of Light GBM and Random Forest to further improve prediction results. This paper is the first work in both academia and the storage industry to design a drive failure prediction system for deploying in data centers by optimizing models with the highest Precision instead of the highest F1-score to minimize false positive rate. We advance to get drive failure prediction with 100% Precision and 21% Recall, enabling us to avoid the high cost of false positives.
引用
收藏
页码:29 / 33
页数:5
相关论文
共 50 条
  • [1] Using Machine Learning for Data Center Cooling Infrastructure Efficiency Prediction
    Shoukourian, Hayk
    Wilde, Torsten
    Labrenz, Detlef
    Bode, Arndt
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 954 - 963
  • [2] Hard Drive Failure Prediction Using Big Data
    Yang, Wenjun
    Hu, Dianming
    Liu, Yuliang
    Wang, Shuhao
    Jiang, Tianming
    2015 IEEE 34TH SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS WORKSHOP (SRDSW), 2015, : 13 - 18
  • [3] Machine Learning Models for SSD and HDD Reliability Prediction
    Pinciroli, Riccardo
    Yang, Lishan
    Alter, Jacob
    Smirni, Evgenia
    2022 68TH ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS 2022), 2022,
  • [4] Prediction of Post-Listing Outcome by Machine Learning Using Transplant Center Data
    Sageshima, Junichiro
    Perez, Richard V.
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2019, 229 (04) : E217 - E218
  • [5] HEART FAILURE RISK PREDICTION USING AZURE DATA LAKE ARCHITECTURE WITH AUTOMATED MACHINE LEARNING AND MACHINE LEARNING APPROACHES
    Alghamdi, Ahmed M.
    Al Shehri, Waleed
    Almalki, Jameel
    Jannah, Najlaa
    Bahaddad, Adel
    Bokhary, Abdullah M.
    THERMAL SCIENCE, 2024, 28 (6B): : 5059 - 5069
  • [6] Machine Learning Model Update Strategies for Hard Disk Drive Failure Prediction
    Zufle, Marwin
    Erhard, Florian
    Kounev, Samuel
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1379 - 1386
  • [7] Microgrid Data Prediction Using Machine Learning
    Lautert, Renata Rodrigues
    Cambambi, Claudio Adriano C.
    Rangel, Camilo Alberto S.
    Canha, Luciane Neves
    de Freitas, Adriano Gomes
    Brignol, Wagner da Silva
    2023 15TH SEMINAR ON POWER ELECTRONICS AND CONTROL, SEPOC, 2023,
  • [8] Failure Prediction of Aircraft Equipment Using Machine Learning with a Hybrid Data Preparation Method
    Celikmih, Kadir
    Inan, Onur
    Uguz, Harun
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [9] Failure prediction of turbines using machine learning algorithms
    Kumar, R. Sachin
    Ram, S. Sakthiya
    Jayakar, S. Arun
    Kumar, T. K. Senthil
    MATERIALS TODAY-PROCEEDINGS, 2022, 66 : 1175 - 1182
  • [10] Prediction of creep failure time using machine learning
    Soumyajyoti Biswas
    David Fernandez Castellanos
    Michael Zaiser
    Scientific Reports, 10