A survey on missing data in machine learning

被引:0
|
作者
Tlamelo Emmanuel
Thabiso Maupong
Dimane Mpoeleng
Thabo Semong
Banyatsang Mphago
Oteng Tabona
机构
[1] Botswana International University of Science and Technology,Department of Computer Science and Information Systems
来源
关键词
Missing data; Imputation; Machine learning;
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning has been the corner stone in analysing and extracting information from data and often a problem of missing values is encountered. Missing values occur because of various factors like missing completely at random, missing at random or missing not at random. All these may result from system malfunction during data collection or human error during data pre-processing. Nevertheless, it is important to deal with missing values before analysing data since ignoring or omitting missing values may result in biased or misinformed analysis. In literature there have been several proposals for handling missing values. In this paper, we aggregate some of the literature on missing data particularly focusing on machine learning techniques. We also give insight on how the machine learning approaches work by highlighting the key features of missing values imputation techniques, how they perform, their limitations and the kind of data they are most suitable for. We propose and evaluate two methods, the k nearest neighbor and an iterative imputation method (missForest) based on the random forest algorithm. Evaluation is performed on the Iris and novel power plant fan data with induced missing values at missingness rate of 5% to 20%. We show that both missForest and the k nearest neighbor can successfully handle missing values and offer some possible future research direction.
引用
收藏
相关论文
共 50 条
  • [41] Integration of Survey Data in R Based on Machine Learning
    Spaziani, Mattia
    Frattarola, Doriana
    D'Orazio, Marcello
    [J]. ROMANIAN STATISTICAL REVIEW, 2019, (03) : 5 - 16
  • [42] Erratum to: A survey of machine learning for big data processing
    Junfei Qiu
    Qihui Wu
    Guoru Ding
    Yuhua Xu
    Shuo Feng
    [J]. EURASIP Journal on Advances in Signal Processing, 2016
  • [43] Learning with Missing Data
    Escobar, Carlos A.
    Arinez, Jorge
    Macias, Daniela
    Morales-Menendez, Ruben
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5037 - 5045
  • [44] Enriching administrative data using survey data and machine learning techniques
    Kunaschk, Max
    [J]. ECONOMICS LETTERS, 2024, 243
  • [45] Missing Data Imputation: A Survey
    Kelkar, Bhagyashri Abhay
    [J]. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY, 2022, 14 (01)
  • [46] Dynamic data cleaning method of abnormal and missing data in a distribution network based on machine learning
    Mei, Yujie
    Li, Yong
    Zhou, Wangfeng
    Guo, Yixiu
    Deng, Wei
    Qiao, Xuebo
    [J]. Dianli Xitong Baohu yu Kongzhi/Power System Protection and Control, 2023, 51 (07): : 158 - 169
  • [47] Data Mining and Machine Learning in Cricket Match Outcome Prediction: Missing Links
    Hatharasinghe, Manuka Maduranga
    Poravi, Guhanathan
    [J]. 2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [48] Imputation of missing gas permeability data for polymer membranes using machine learning
    Yuan, Qi
    Longo, Mariagiulia
    Thornton, Aaron W.
    McKeown, Neil B.
    Comesana-Gandara, Bibiana
    Jansen, Johannes C.
    Jelfs, Kim E.
    [J]. JOURNAL OF MEMBRANE SCIENCE, 2021, 627
  • [49] Missing Values and Imputation in Healthcare Data: Can Interpretable Machine Learning Help?
    Chen, Zhi
    Tan, Sarah
    Chajewska, Urszula
    Rudin, Cynthia
    Caruana, Rich
    [J]. CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 209, 2023, 209 : 86 - 99
  • [50] A Machine Learning Approach to Mental Disorder Prediction: Handling the Missing Data Challenge
    Mokheleli, Tsholofelo
    Bokaba, Tebogo
    Museba, Tinofirei
    Ntshingila, Nompumelelo
    [J]. EMERGING TECHNOLOGIES FOR DEVELOPING COUNTRIES, AFRICATEK 2023, 2024, 520 : 93 - 106