Missing Data Analysis Using Statistical and Machine Learning Methods in Facility-Based Maternal Health Records

被引:0
|
作者
Memon S.M.Z. [1 ]
Wamala R. [2 ]
Kabano I.H. [3 ,4 ]
机构
[1] Department of Statistical Methods and Actuarial Science, Makerere University, Kampala
[2] Department of Planning and Applied Statistics, Makerere University, Kampala
[3] Department of Applied Statistics, School of Economics, University of Rwanda, Kigali
[4] African Centre of Excellence in Data Science, College of Business and Economics, University of Rwanda, Kigali
关键词
Facility-based records; Imputation; Maternal morbidity; Missing data; Missing mechanism; Missing pattern;
D O I
10.1007/s42979-022-01249-z
中图分类号
学科分类号
摘要
Missing data are a rule rather than an exception in quantitative research. The questionable aspect however is the extent, pattern, mechanism, and treatment of missingness in facility-based paper maternal health records. We utilized data from maternal health records at Kawempe National Referral Hospital, Uganda. Only records of women who had given birth at the Hospital during January 2017 to January 2021 were considered. The analysis was done using R-Studio using frequency distributions, Pearson χ2 Test. Treatment of missingness was done using Listwise deletion (LD), Mode Imputation, Multiple Imputation by chained equations (MICE), Imputation using K-Nearest Neighbors (KNN) and Random Forest (RF) Imputation. Performance of methods was investigated using prediction accuracy and the Kruskal–Wallis Test on Standard Errors (SEs) derived following a Logistic Regression. Overall, 5% of the data was missing. The proportion of missingness ranged from 1.4 to 20.7% in variables. Case-wise missingness was established where 2498 out of the 4626 cases (54%) had at-least one variable with missing value. The pattern of missingness was arbitrary. The data suggest either missing at random or missing completely at random. With the exception of LD, no difference in SEs following Logistic Regression was noted in the imputation methods for treatment of missingness (p > 0.05). Further, LD yielded the lowest prediction accuracy after Logistic Regression. No major variations were noted in the prediction accuracy following a Logistic Regression after imputation using MICE, mode imputation, KNN and RF. Missingness in facility-based health records should not be ignored. Researchers need to pay attention to both overall and case-wise missingness. © 2022, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [1] A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods
    Yingfeng Ge
    Zhiwei Li
    Jinxin Zhang
    [J]. Scientific Reports, 13
  • [2] A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods
    Ge, Yingfeng
    Li, Zhiwei
    Zhang, Jinxin
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)
  • [3] Missing data imputation using statistical and machine learning methods in a real breast cancer problem
    Jerez, Jose M.
    Molina, Ignacio
    Garcia-Laencina, Pedro J.
    Alba, Emilio
    Ribelles, Nuria
    Martin, Miguel
    Franco, Leonardo
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 50 (02) : 105 - 115
  • [4] Health facility-based maternal death audit in Tigray, Ethiopia
    Hailu, Samuel
    Enqueselassie, Fikre
    Berhane, Yemane
    [J]. ETHIOPIAN JOURNAL OF HEALTH DEVELOPMENT, 2009, 23 (02) : 115 - 119
  • [5] Health facility-based maternal mortality in Nigeria: A systematic review and meta-analysis
    Haruna, Iman U.
    Yakasai, Ahmad M.
    Haruna, Sadiya W.
    Yau, Jamila A.
    Jaafar, Yusuf A.
    Muhammad, Hamza
    Tukur, Jamilu
    [J]. NIGERIAN JOURNAL OF BASIC AND CLINICAL SCIENCES, 2023, 20 (02) : 101 - 108
  • [6] Big Data Analysis Using Modern Statistical and Machine Learning Methods in Medicine
    Yoo, Changwon
    Ramirez, Luis
    Liuzzi, Juan
    [J]. INTERNATIONAL NEUROUROLOGY JOURNAL, 2014, 18 (02) : 50 - 57
  • [7] Missing data analysis using machine learning methods to predict the performance of technical students
    Melo Junior, Gilberto de
    Alcala, Symone G. Soares
    Furriel, Geovanne Pereira
    Vieira, Silvio L.
    [J]. REVISTA BRASILEIRA DE COMPUTACAO APLICADA, 2020, 12 (02): : 134 - 143
  • [8] Analysis of Machine Learning Based Imputation of Missing Data
    Rizvi, Syed Tahir Hussain
    Latif, Muhammad Yasir
    Amin, Muhammad Saad
    Telmoudi, Achraf Jabeur
    Shah, Nasir Ali
    [J]. CYBERNETICS AND SYSTEMS, 2023,
  • [9] Performance of Multiple Imputation Using Modern Machine Learning Methods in Electronic Health Records Data
    Getz, Kylie
    Hubbard, Rebecca A.
    Linn, Kristin A.
    [J]. EPIDEMIOLOGY, 2023, 34 (02) : 206 - 215
  • [10] Prediction of missing temperature data using different machine learning methods
    Okan Mert Katipoğlu
    [J]. Arabian Journal of Geosciences, 2022, 15 (1)