Optimizing Efficiency of Machine Learning Based Hard Disk Failure Prediction by Two-Layer Classification-Based Feature Selection

被引:0
|
作者
Wang, Han [1 ]
Zhuge, Qingfeng [1 ]
Sha, Edwin Hsing-Mean [1 ]
Xu, Rui [1 ]
Song, Yuhong [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200063, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
ML; AI; disk failure prediction; timeliness; feature selection;
D O I
10.3390/app13137544
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Predicting hard disk failure effectively and efficiently can prevent the high costs of data loss for data storage systems. Disk failure prediction based on machine learning and artificial intelligence has gained notable attention, because of its good capabilities. Improving the accuracy and performance of disk failure prediction, however, is still a challenging problem. When disk failure is about to occur, the time is limited for the prediction process, including building models and predicting. Faster training would promote the efficiency of model updates, and late predictions not only have no value but also waste resources. To improve both the prediction quality and modeling timeliness, a two-layer classification-based feature selection scheme is proposed in this paper. An attribute filter calculating the importance of attributes was designed, to remove attributes insensitive to failure identification, where importance is gained based on the idea of classification tree models. Furthermore, by determining the correlation between features based on the correlation coefficient, an attribute classification method is proposed. In experiments, the models of machine learning and artificial intelligence were applied, and they included naive Bayesian, random forest, support vector machine, gradient boosted decision tree, convolutional neural networks, and long short-term memory. The results showed that the proposed technique could improve the prediction accuracy of ML/AI-based hard disk failure prediction models. Specifically, utilizing random forest and long short-term memory with the proposed technique showed the best accuracy. Meanwhile, the proposed scheme could reduce training and prediction latency by 75% and 83%, respectively, in the best case compared with the baseline methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques
    Mushtaq, Zaigham
    Ramzan, Muhammad Farhan
    Ali, Sikandar
    Baseer, Samad
    Samad, Ali
    Husnain, Mujtaba
    MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [22] A Classification-Based Machine Learning Approach to the Prediction of Cyanobacterial Blooms in Chilgok Weir, South Korea
    Kim, Jongchan
    Jonoski, Andreja
    Solomatine, Dimitri P.
    WATER, 2022, 14 (04)
  • [23] Optimizing Feature Selection for Solar Park Classification: Approaches with OBIA and Machine Learning
    Ladisa, Claudio
    Capolupo, Alessandra
    Tarantino, Eufemia
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT V, 2024, 14819 : 286 - 301
  • [24] Optimizing feature selection and remote sensing classification with an enhanced machine learning method
    Ewees, Ahmed A.
    Alshahrani, Mohammed M.
    Alharthi, Abdullah M.
    Gaheen, Marwa A.
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):
  • [25] Graph Classification Based on Sparse Graph Feature Selection and Extreme Learning Machine
    Yu, Yajun
    Pan, Zhisong
    Hu, Guyu
    PROCEEDINGS OF ELM-2015, VOL 1: THEORY, ALGORITHMS AND APPLICATIONS (I), 2016, 6 : 179 - 191
  • [26] Graph classification based on sparse graph feature selection and extreme learning machine
    Yu, Yajun
    Pan, Zhisong
    Hu, Guyu
    Ren, Huifeng
    NEUROCOMPUTING, 2017, 261 : 20 - 27
  • [27] Disk Failure Prediction Based on Transfer Learning
    Gao, Guangfu
    Wu, Peng
    Li, Hui
    Zhang, Tianze
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2022, PT II, 2022, 13394 : 628 - 637
  • [28] Feature Selection Based on Extreme Learning Machine
    Wang, Zhaoxi
    Zhao, Meng
    Chen, Shengyong
    ICDLT 2019: 2019 3RD INTERNATIONAL CONFERENCE ON DEEP LEARNING TECHNOLOGIES, 2019, : 57 - 63
  • [29] A Gas Emission Prediction Model Based on Feature Selection and Improved Machine Learning
    Shao, Liangshan
    Zhang, Kun
    PROCESSES, 2023, 11 (03)
  • [30] Prediction of Disk Failure Based on Classification Intensity Resampling
    Wu, Sheng
    Guan, Jihong
    INFORMATION, 2024, 15 (06)