Imputation techniques on missing values in breast cancer treatment and fertility data

被引:12
|
作者
Wu, Xuetong [1 ]
Akbarzadeh Khorshidi, Hadi [1 ]
Aickelin, Uwe [1 ]
Edib, Zobaida [2 ]
Peate, Michelle [2 ]
机构
[1] Univ Melbourne, Dept Comp & Informat Syst, Parkville, Vic, Australia
[2] Univ Melbourne, Dept Obstet & Gynaecol, Parkville, Vic, Australia
关键词
Missing data; Imputation; Classification; Breast cancer; Post-treatment amenorrhoea; WOMEN;
D O I
10.1007/s13755-019-0082-4
中图分类号
R-058 [];
学科分类号
摘要
Clinical decision support using data mining techniques offers more intelligent way to reduce the decision error in the last few years. However, clinical datasets often suffer from high missingness, which adversely impacts the quality of modelling if handled improperly. Imputing missing values provides an opportunity to resolve the issue. Conventional imputation methods adopt simple statistical analysis, such as mean imputation or discarding missing cases, which have many limitations and thus degrade the performance of learning. This study examines a series of machine learning based imputation methods and suggests an efficient approach to in preparing a good quality breast cancer (BC) dataset, to find the relationship between BC treatment and chemotherapy-related amenorrhoea, where the performance is evaluated with the accuracy of the prediction. To this end, the reliability and robustness of six well-known imputation methods are evaluated. Our results show that imputation leads to a significant boost in the classification performance compared to the model prediction based on listwise deletion. Furthermore, the results reveal that most methods gain strong robustness and discriminant power even the dataset experiences high missing rate (> 50%).
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Imputation techniques on missing values in breast cancer treatment and fertility data
    Xuetong Wu
    Hadi Akbarzadeh Khorshidi
    Uwe Aickelin
    Zobaida Edib
    Michelle Peate
    [J]. Health Information Science and Systems, 7
  • [2] Proper Imputation Techniques for Missing Values in Data sets
    Aljuaid, Tahani
    Sasi, Sreela
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON DATA SCIENCE & ENGINEERING (ICDSE), 2016, : 146 - 150
  • [3] A BOOTSTRAP METHOD FOR USING IMPUTATION TECHNIQUES FOR DATA WITH MISSING VALUES
    BELLO, AL
    [J]. BIOMETRICAL JOURNAL, 1994, 36 (04) : 453 - 464
  • [4] Treatment of missing values with imputation for the analysis of otologic data
    Laurikkala, J
    Kentala, E
    Juhola, M
    Pyykkö, I
    [J]. MEDICAL INFORMATICS EUROPE '99, 1999, 68 : 428 - 431
  • [5] Imputation Techniques for Incomplete Load Data Based on Seasonality and Orientation of the Missing Values
    Kamisan, Nur Arina Bazilah
    Lee, Muhammad Hisyam
    Hussin, Abdul Ghapor
    Zubairi, Yong Zulina
    [J]. SAINS MALAYSIANA, 2020, 49 (05): : 1165 - 1174
  • [6] Estimation of missing values in air pollution data using single imputation techniques
    Norazian, Mohamed Noor
    Shukri, Yahaya Ahmad
    Azam, Ramli Nor
    Al Bakri, Abdullah Mohd Mustafa
    [J]. SCIENCEASIA, 2008, 34 (03): : 341 - 345
  • [7] Imputation of continuous missing values in profile data
    Yang, Luo
    Wang, Kaibo
    [J]. QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2022, 38 (07) : 3644 - 3662
  • [8] Missing values imputation techniques for Neural Networks patterns
    Lopez-Molina, Thomas
    Perez-Mendez, Anna
    Rivas-Echeverria, Francklin
    [J]. NEW ASPECTS OF SYSTEMS, PTS I AND II, 2008, : 290 - +
  • [9] A comparison of imputation techniques for handling missing data
    Musil, CM
    Warner, CB
    Yobas, PK
    Jones, SL
    [J]. WESTERN JOURNAL OF NURSING RESEARCH, 2002, 24 (07) : 815 - 829
  • [10] Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values
    Garcia-Laencina, Pedro J.
    Abreu, Pedro Henriques
    Abreu, Miguel Henriques
    Afonoso, Noemia
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2015, 59 : 125 - 133