The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model

被引:12
|
作者
Guo, Chao-Yu [1 ,2 ]
Yang, Ying-Chen [1 ,2 ]
Chen, Yi-Hau [3 ]
机构
[1] Natl Yang Ming Univ, Inst Publ Hlth, Sch Med, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Inst Publ Hlth, Sch Med, Hsinchu, Taiwan
[3] Acad Sinica, Inst Stat Sci, Taipei, Taiwan
关键词
machine learning; k-nearest neighbors imputation; random forest imputation; survival data simulation; cox proportional hazard model;
D O I
10.3389/fpubh.2021.680054
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
An adequate imputation of missing data would significantly preserve the statistical power and avoid erroneous conclusions. In the era of big data, machine learning is a great tool to infer the missing values. The root means square error (RMSE) and the proportion of falsely classified entries (PFC) are two standard statistics to evaluate imputation accuracy. However, the Cox proportional hazards model using various types requires deliberate study, and the validity under different missing mechanisms is unknown. In this research, we propose supervised and unsupervised imputations and examine four machine learning-based imputation strategies. We conducted a simulation study under various scenarios with several parameters, such as sample size, missing rate, and different missing mechanisms. The results revealed the type-I errors according to different imputation techniques in the survival data. The simulation results show that the non-parametric "missForest" based on the unsupervised imputation is the only robust method without inflated type-I errors under all missing mechanisms. In contrast, other methods are not valid to test when the missing pattern is informative. Statistical analysis, which is improperly conducted, with missing data may lead to erroneous conclusions. This research provides a clear guideline for a valid survival analysis using the Cox proportional hazard model with machine learning-based imputations.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Multiple Imputation for the Cox Proportional Hazards Model with Missing Covariates
    Paik M.C.
    [J]. Lifetime Data Analysis, 1997, 3 (3) : 289 - 298
  • [2] A Machine Learning-Based Missing Data Imputation with FHIR Interoperability Approach in Sepsis Prediction
    Toro Beltran, Cristian Fernando
    Villarreal Ibanez, Erick Daniel
    Milen Orejuela, Vivian
    Garcia Henao, John Anderson
    [J]. HIGH PERFORMANCE COMPUTING, CARLA 2022, 2022, 1660 : 116 - 130
  • [3] Analysis of Machine Learning Based Imputation of Missing Data
    Rizvi, Syed Tahir Hussain
    Latif, Muhammad Yasir
    Amin, Muhammad Saad
    Telmoudi, Achraf Jabeur
    Shah, Nasir Ali
    [J]. CYBERNETICS AND SYSTEMS, 2023,
  • [4] A systematic review of machine learning-based missing value imputation techniques
    Thomas, Tressy
    Rajabi, Enayat
    [J]. DATA TECHNOLOGIES AND APPLICATIONS, 2021, 55 (04) : 558 - 585
  • [5] Multiple imputation of missing covariates for the Cox proportional hazards cure model
    Beesley, Lauren J.
    Bartlett, Jonathan W.
    Wolf, Gregory T.
    Taylor, Jeremy M. G.
    [J]. STATISTICS IN MEDICINE, 2016, 35 (26) : 4701 - 4717
  • [6] Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation
    Alamoodi, A. H.
    Zaidan, B. B.
    Zaidan, A. . A. .
    Albahri, O. S.
    Chen, Juliana
    Chyad, M. A.
    Garfan, Salem
    Aleesa, A. M.
    [J]. CHAOS SOLITONS & FRACTALS, 2021, 151
  • [7] Statistical inference under imputation for proportional hazard model with missing covariates
    Qiu, Zhiping
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (23) : 11575 - 11590
  • [8] Machine Learning Based Missing Data Imputation in Categorical Datasets
    Ishaq, Muhammad
    Zahir, Sana
    Iftikhar, Laila
    Bulbul, Mohammad Farhad
    Rho, Seungmin
    Lee, Mi Young
    [J]. IEEE ACCESS, 2024, 12 : 88332 - 88344
  • [9] A novel machine learning-based imputation strategy for missing data in step-stress accelerated degradation test
    Li, Yaqiu
    Zhou, Qijie
    Fan, Ye
    Pan, Guangze
    Dai, Zongbei
    Lei, Baimao
    [J]. HELIYON, 2024, 10 (04)
  • [10] Learning-Based Adaptive Imputation Method with kNN Algorithm for Missing Power Data
    Kim, Minkyung
    Park, Sangdon
    Lee, Joohyung
    Joo, Yongjae
    Choi, Jun Kyun
    [J]. ENERGIES, 2017, 10 (10)