An adaptive Laplacian weight random forest imputation for imbalance and mixed-type data

被引:10
|
作者
Ren, Lijuan [1 ,2 ]
Seklouli, Aicha Sekhari [1 ]
Zhang, Haiqing [2 ]
Wang, Tao [3 ]
Bouras, Abdelaziz [4 ]
机构
[1] Univ Claude Bernard Lyon 1, Univ Lyon, Univ Lyon 2, INSA Lyon,DISP UR4570, F-69676 Bron, France
[2] Chengdu Univ Informat Technol, Sch software engn, Chengdu 610225, Peoples R China
[3] Univ Claude Bernard Lyon 1, Univ Lyon, Univ Jean Monnet St Etienne, INSA Lyon,Univ Lyon 2,DISP UR4570, F-42300 Roanne, France
[4] Qatar Univ, Coll Engn, CSE, 2713, Doha, Qatar
关键词
Missing values; Imputation; Random forest; Mixed-type; Imbalanced; MISSING VALUE IMPUTATION; DECISION TREES; DATA SETS; SMOTE;
D O I
10.1016/j.is.2022.102122
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As the application of information technology in the medical field is resulting in a large amount of medical data. As early withdrawal and refusal of participants, there are a lot of missing values in medical data. Although various processing methods for missing values have been proposed, few methods for those medical data with characteristics of imbalance and mixed-type data. In this work, we proposed an adaptive Laplacian weight random forest, called ALWRF. In ALWRF, feature weights were adjusted dynamically when model constructing, which increases selection probabilities of features with low Laplacian score and high importance. Meanwhile, a random operator is introduced to increase the diversity of trees. Furthermore, we proposed an imputation method based on SMOTE -NC oversampling technology and the ALWRF method for imbalanced and mixed-type data, called SncALWRFI. Meanwhile, Bayesian optimization and cross-validation were employed to search optimal parameters. The experimental results showed that the ALWRF method outperforms random forest and Bayesian optimized random forest in terms of classification and regression accuracy. Further, in the experiment for missing values, the SncALWRFI showed the best imputation accuracy, and it performed high imputation effectiveness in public datasets with characteristics of imbalanced and mixed-type.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
    Rabea Aschenbruck
    Gero Szepannek
    Adalbert F. X. Wilhelm
    Journal of Classification, 2023, 40 : 2 - 24
  • [2] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
    Aschenbruck, Rabea
    Szepannek, Gero
    Wilhelm, Adalbert F. X.
    JOURNAL OF CLASSIFICATION, 2023, 40 (01) : 2 - 24
  • [3] High-dimensional large-scale mixed-type data imputation under missing at random
    Liu, Wei
    Li, Guizhen
    Zhou, Ling
    Luo, Lan
    SCIENCE CHINA-MATHEMATICS, 2025, 68 (04) : 969 - 1000
  • [4] High-dimensional large-scale mixed-type data imputation under missing at random
    Wei Liu
    Guizhen Li
    Ling Zhou
    Lan Luo
    Science China(Mathematics), 2025, 68 (04) : 969 - 1000
  • [5] Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
    Quist, Jelmar
    Taylor, Lawson
    Staaf, Johan
    Grigoriadis, Anita
    CANCERS, 2021, 13 (05) : 1 - 15
  • [6] MissForest-non-parametric missing value imputation for mixed-type data
    Stekhoven, Daniel J.
    Buehlmann, Peter
    BIOINFORMATICS, 2012, 28 (01) : 112 - 118
  • [7] Mixed-Type Imputation for Missing Data Credal Classification via Quality Matrices
    Zhang, Zuowei
    Liu, Zhunga
    Tian, Hongpeng
    Martin, Arnaud
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (08): : 4772 - 4785
  • [8] Data depth for mixed-type data through MDS. An application to biological age imputation
    Cascos, Ignacio
    Grane, Aurea
    Qian, Jingye
    SOCIO-ECONOMIC PLANNING SCIENCES, 2025, 98
  • [9] Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables
    Cao, Yi
    Allore, Heather
    Vander Wyk, Brent
    Gutman, Roee
    STATISTICS IN MEDICINE, 2022, 41 (30) : 5844 - 5876
  • [10] Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach
    Erica Tavazzi
    Sebastian Daberdaku
    Rosario Vasta
    Andrea Calvo
    Adriano Chiò
    Barbara Di Camillo
    BMC Medical Informatics and Decision Making, 20