Missing Data Imputation Through the Use of the Random Forest Algorithm

被引:0
|
作者
Pantanowitz, Adam [1 ]
Marwala, Tshilidzi [1 ]
机构
[1] Univ Witwatersrand, Sch Elect & Informat Engn, ZA-2050 Johannesburg, South Africa
来源
关键词
auto-associative; imputation; missing data; neural network; random forest;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a comparison of different paradigms used for missing data imputation. The data set used is HIV seroprevalence data front an antenatal clinic study survey performed in 2001. Data imputation is performed through five methods: Random Forests; auto-associative neural networks with genetic algorithms; auto-associative neuro-fuzzy configurations; and two random forest and neural network based hybrids. Results indicate that Random Forests are superior in imputing missing data for the given data set in terms of accuracy and in terms of computation time, with accuracy increases of up to 32 % on average for certain variables when compared with auto-associative networks. While the concept of hybrid systems has promise, the presented systems appear to be hindered by their auto-associative neural network components.
引用
收藏
页码:53 / 62
页数:10
相关论文
共 50 条
  • [1] Missing Data Imputation Method Combining Random Forest and Generative Adversarial Imputation Network
    Ou, Hongsen
    Yao, Yunan
    He, Yi
    [J]. SENSORS, 2024, 24 (04)
  • [2] Imputation of missing clinical, cognitive and neuroimaging data of Dementia using missForest, a Random Forest based algorithm
    Aracri, Federica
    Bianco, Maria Giovanna
    Quattrone, Andrea
    Sarica, Alessia
    [J]. 2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 684 - 688
  • [3] Imputation of missing well log data by random forest and its uncertainty analysis
    Feng, Runhai
    Grana, Dario
    Balling, Niels
    [J]. COMPUTERS & GEOSCIENCES, 2021, 152
  • [4] Multiple imputation of ordinal missing not at random data
    Hammon, Angelina
    [J]. ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2023, 107 (04) : 671 - 692
  • [5] Multiple imputation of ordinal missing not at random data
    Angelina Hammon
    [J]. AStA Advances in Statistical Analysis, 2023, 107 : 671 - 692
  • [6] Missing data analysis in cognitive diagnostic models: Random forest threshold imputation method
    You Xiaofeng
    Yang Jianqin
    Qin Chunying
    Liu Hongyun
    [J]. ACTA PSYCHOLOGICA SINICA, 2023, 55 (07) : 1192 - 1206
  • [7] A genetic algorithm for multivariate missing data imputation
    Carlos Figueroa-Garcia, Juan
    Neruda, Roman
    Hernandez-Perez, German
    [J]. INFORMATION SCIENCES, 2023, 619 : 947 - 967
  • [8] Improving Air Quality Data Reliability through Bi-Directional Univariate Imputation with the Random Forest Algorithm
    Arnaut, Filip
    Durdevic, Vladimir
    Kolarski, Aleksandra
    Sreckovic, Vladimir A.
    Jevremovic, Sreten
    [J]. SUSTAINABILITY, 2024, 16 (17)
  • [9] Improvement of random forest by multiple imputation applied to tower crane accident prediction with missing data
    Jiang, Ling
    Zhao, Tingsheng
    Feng, Chuxuan
    Zhang, Wei
    [J]. ENGINEERING CONSTRUCTION AND ARCHITECTURAL MANAGEMENT, 2023, 30 (03) : 1222 - 1242
  • [10] Missing Data and Multiple Imputation in Rheumatoid Arthritis Registries Using Sequential Random Forest Method
    Al-Saber, Ahmed
    Al-Herz, Adeeba
    Pan, Jiazhu
    Saleh, Khulood
    Al-Awadhi, Adel
    Al-Kandari, Waleed
    Hasan, Eman
    Ghanem, Aqeel
    Hussain, Mohammed
    Ali, Yaser
    Nahar, Ebrahim
    Alenizi, Ahmad
    Hayat, Sawsan
    Abutiban, Fatemah
    Aldei, Ali
    Alkadi, Amjad
    Alhajeri, Heba
    Behbehani, Husain
    Alhadhood, Naser
    Mokaddem, Khaled
    Khadrawy, Ahmed
    Fazal, Ammad
    Zaman, Agaz
    Mazloum, Ghada
    Bartella, Youssef
    Hamed, Sally
    Alsouk, Ramia
    [J]. ARTHRITIS & RHEUMATOLOGY, 2020, 72