The Optimal Machine Learning-Based Missing Data Imputation for the Cox Proportional Hazard Model

被引:12
|
作者
Guo, Chao-Yu [1 ,2 ]
Yang, Ying-Chen [1 ,2 ]
Chen, Yi-Hau [3 ]
机构
[1] Natl Yang Ming Univ, Inst Publ Hlth, Sch Med, Taipei, Taiwan
[2] Natl Yang Ming Chiao Tung Univ, Inst Publ Hlth, Sch Med, Hsinchu, Taiwan
[3] Acad Sinica, Inst Stat Sci, Taipei, Taiwan
关键词
machine learning; k-nearest neighbors imputation; random forest imputation; survival data simulation; cox proportional hazard model;
D O I
10.3389/fpubh.2021.680054
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
An adequate imputation of missing data would significantly preserve the statistical power and avoid erroneous conclusions. In the era of big data, machine learning is a great tool to infer the missing values. The root means square error (RMSE) and the proportion of falsely classified entries (PFC) are two standard statistics to evaluate imputation accuracy. However, the Cox proportional hazards model using various types requires deliberate study, and the validity under different missing mechanisms is unknown. In this research, we propose supervised and unsupervised imputations and examine four machine learning-based imputation strategies. We conducted a simulation study under various scenarios with several parameters, such as sample size, missing rate, and different missing mechanisms. The results revealed the type-I errors according to different imputation techniques in the survival data. The simulation results show that the non-parametric "missForest" based on the unsupervised imputation is the only robust method without inflated type-I errors under all missing mechanisms. In contrast, other methods are not valid to test when the missing pattern is informative. Statistical analysis, which is improperly conducted, with missing data may lead to erroneous conclusions. This research provides a clear guideline for a valid survival analysis using the Cox proportional hazard model with machine learning-based imputations.
引用
收藏
页数:8
相关论文
共 50 条
  • [11] Approximate Imputation Method for Missing Data in Machine Learning
    [J]. 1600, Xi'an Jiaotong University (51):
  • [12] The use of a hazard-based duration model for imputation of missing personal income data
    C. O. Tong
    Jackie K. L. Lee
    [J]. Transportation, 2009, 36 : 565 - 579
  • [13] The use of a hazard-based duration model for imputation of missing personal income data
    Tong, C. O.
    Lee, Jackie K. L.
    [J]. TRANSPORTATION, 2009, 36 (05) : 565 - 579
  • [14] Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
    Andrea Marshall
    Douglas G Altman
    Roger L Holder
    [J]. BMC Medical Research Methodology, 10
  • [15] Missing Data Imputation using Machine Learning Algorithm for Supervised Learning
    Cenitta, D.
    Arjunan, R. Vijaya
    Prema, K., V
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [16] Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study
    Marshall, Andrea
    Altman, Douglas G.
    Holder, Roger L.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10
  • [17] Modulo 9 model-based learning for missing data imputation
    Ngueilbaye, Alladoumbaye
    Wang, Hongzhi
    Mahamat, Daouda Ahmat
    Junaidu, Sahalu B.
    [J]. APPLIED SOFT COMPUTING, 2021, 103
  • [18] A deep learning-based imputation method for missing gaps in satellite aerosol products by fusing numerical model data
    Liu, Ning
    Li, Yi
    Zang, Zengliang
    Hu, Yiwen
    Fang, Xin
    Lolli, Simone
    [J]. ATMOSPHERIC ENVIRONMENT, 2024, 325
  • [19] ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Hamdan, Hazlina
    [J]. JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 470 - 476
  • [20] A prognostic framework for predicting lung signet ring cell carcinoma via a machine learning based cox proportional hazard model
    Chen, Haixin
    Xu, Yanyan
    Lin, Haowen
    Wan, Shibiao
    Luo, Lianxiang
    [J]. JOURNAL OF CANCER RESEARCH AND CLINICAL ONCOLOGY, 2024, 150 (07)