Optimizing Efficiency of Machine Learning Based Hard Disk Failure Prediction by Two-Layer Classification-Based Feature Selection

被引:0
|
作者
Wang, Han [1 ]
Zhuge, Qingfeng [1 ]
Sha, Edwin Hsing-Mean [1 ]
Xu, Rui [1 ]
Song, Yuhong [1 ]
机构
[1] East China Normal Univ, Sch Comp Sci & Technol, Shanghai 200063, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
ML; AI; disk failure prediction; timeliness; feature selection;
D O I
10.3390/app13137544
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Predicting hard disk failure effectively and efficiently can prevent the high costs of data loss for data storage systems. Disk failure prediction based on machine learning and artificial intelligence has gained notable attention, because of its good capabilities. Improving the accuracy and performance of disk failure prediction, however, is still a challenging problem. When disk failure is about to occur, the time is limited for the prediction process, including building models and predicting. Faster training would promote the efficiency of model updates, and late predictions not only have no value but also waste resources. To improve both the prediction quality and modeling timeliness, a two-layer classification-based feature selection scheme is proposed in this paper. An attribute filter calculating the importance of attributes was designed, to remove attributes insensitive to failure identification, where importance is gained based on the idea of classification tree models. Furthermore, by determining the correlation between features based on the correlation coefficient, an attribute classification method is proposed. In experiments, the models of machine learning and artificial intelligence were applied, and they included naive Bayesian, random forest, support vector machine, gradient boosted decision tree, convolutional neural networks, and long short-term memory. The results showed that the proposed technique could improve the prediction accuracy of ML/AI-based hard disk failure prediction models. Specifically, utilizing random forest and long short-term memory with the proposed technique showed the best accuracy. Meanwhile, the proposed scheme could reduce training and prediction latency by 75% and 83%, respectively, in the best case compared with the baseline methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Business Process Remaining Time Prediction Based on Two-Layer Machine Learning
    Sun X.-X.
    Hou W.-J.
    Ying Y.-K.
    Yu D.-J.
    Jisuanji Xuebao/Chinese Journal of Computers, 2021, 44 (11): : 2283 - 2294
  • [2] Congestive heart failure prediction based on feature selection and machine learning algorithms
    Morillo-Velepucha, Diego
    Reategui, Ruth
    Valdiviezo-Diaz, Priscila
    Barba-Guaman, Luis
    2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2022,
  • [3] Hard Disk Failure Prediction Based on Blending Ensemble Learning
    Zhang, Mingyu
    Ge, Wenqiang
    Tang, Ruichun
    Liu, Peishun
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [4] A TWO-LAYER CLASSIFICATION MODEL ON HUMAN ACTIVITY RECOGNITION BASED ON CLUSTERING ALGORITHM AND FEATURE SELECTION
    Liu, Lijue
    Wang, Kewei
    Li, Yi
    Liu, Zhuo
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2022, 22 (10)
  • [5] Prediction of the Critical Temperature of Superconductors Based on Two-Layer Feature Selection and the Optuna-Stacking Ensemble Learning Model
    Yu, Jiahao
    Zhao, Yongman
    Pan, Rongshun
    Zhou, Xue
    Wei, Zikai
    ACS OMEGA, 2023, 8 (03): : 3078 - 3090
  • [6] Evolutionary feature selection for machine learning based malware classification
    Kale, Gulsade
    Bostanci, Gazi Erkan
    Celebi, Fatih Vehbi
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2024, 56
  • [7] Heart Diseases Prediction for Optimization based Feature Selection and Classification using Machine Learning Methods
    Rajinikanth, N.
    Pavithra, L.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (02) : 636 - 643
  • [8] Factor Investing with Classification-Based Supervised Machine Learning
    Aw, Edward N. W.
    Jiang, Joshua
    Jiang, John Q.
    JOURNAL OF INVESTING, 2022, 31 (03): : 62 - 72
  • [9] An Ensemble Model for PM2.5 Concentration Prediction Based on Feature Selection and Two-Layer Clustering Algorithm
    Wu, Xiaoxuan
    Wen, Qiang
    Zhu, Jun
    ATMOSPHERE, 2023, 14 (10)
  • [10] Prediction of Heart Failure by using Machine Learning and Feature Selection
    Aslam, Muhammad Haseeb
    Hussain, Syed Fawad
    2022 17TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET'22), 2022, : 160 - 165