Imputation techniques on missing values in breast cancer treatment and fertility data

被引:12
|
作者
Wu, Xuetong [1 ]
Akbarzadeh Khorshidi, Hadi [1 ]
Aickelin, Uwe [1 ]
Edib, Zobaida [2 ]
Peate, Michelle [2 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic, Australia
[2] Univ Melbourne, Dept Obstet & Gynaecol, Parkville, Vic, Australia
关键词
Missing data; Imputation; Classification; Breast cancer; Post-treatment amenorrhoea; WOMEN;
D O I
10.1007/s13755-019-0082-4
中图分类号
R-058 [];
学科分类号
摘要
Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction. To this end, the reliability and robustness of six well-known imputation methods are evaluated. Our results show that imputation leads to a significant boost in the classification performance compared to the model prediction based on listwise deletion. Furthermore, the results reveal that most methods gain strong robustness and discriminant power even the dataset experiences high missing rate (> 50%).
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Multiple imputation of missing values
    Royston, Patrick
    [J]. STATA JOURNAL, 2004, 4 (03): : 227 - 241
  • [22] Data Imputation for Symbolic Regression with Missing Values: A Comparative Study
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    [J]. 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2093 - 2100
  • [23] Imputation of missing values for electronic health record laboratory data
    Jiang Li
    Xiaowei S. Yan
    Durgesh Chaudhary
    Venkatesh Avula
    Satish Mudiganti
    Hannah Husby
    Shima Shahjouei
    Ardavan Afshar
    Walter F. Stewart
    Mohammed Yeasin
    Ramin Zand
    Vida Abedi
    [J]. npj Digital Medicine, 4
  • [24] Toward the Imputation and Prediction of Condition Monitoring Data with Missing Values
    Zhang, Di
    Li, Canbing
    Zhu, Jizhong
    [J]. 2023 IEEE/IAS INDUSTRIAL AND COMMERCIAL POWER SYSTEM ASIA, I&CPS ASIA, 2023, : 996 - 1002
  • [25] Imputation of Missing Values in Training Data using Variational Autoencoder
    Hong, Xuerui
    Hao, Shuang
    [J]. 2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW, 2023, : 49 - 54
  • [26] Methods for imputation of missing values in air quality data sets
    Junninen, H
    Niska, H
    Tuppurainen, K
    Ruuskanen, J
    Kolehmainen, M
    [J]. ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) : 2895 - 2907
  • [27] Multiple imputation of missing values in household data with structural zeros
    Akande, Olanrewaju
    Reiter, Jerome
    Barrientos, Andres F.
    [J]. SURVEY METHODOLOGY, 2019, 45 (02) : 271 - 294
  • [28] Optimization methods for the imputation of missing values in Educational Institutions Data
    Aureli, D.
    Bruni, R.
    Daraio, C.
    [J]. METHODSX, 2021, 8
  • [29] Improved KNN Imputation for Missing Values in Gene Expression Data
    Keerin, Phimmarin
    Boongoen, Tossapon
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 4009 - 4025
  • [30] Handling Missing Values in Longitudinal Panel Data With Multiple Imputation
    Young, Rebekah
    Johnson, David R.
    [J]. JOURNAL OF MARRIAGE AND FAMILY, 2015, 77 (01) : 277 - 294