An adaptive Laplacian weight random forest imputation for imbalance and mixed-type data

被引：10

作者：

Ren, Lijuan ^{[1
,2
]}

Seklouli, Aicha Sekhari ^{[1
]}

Zhang, Haiqing ^{[2
]}

Wang, Tao ^{[3
]}

Bouras, Abdelaziz ^{[4
]}

机构：

[1] Univ Claude Bernard Lyon 1, Univ Lyon, Univ Lyon 2, INSA Lyon,DISP UR4570, F-69676 Bron, France

[2] Chengdu Univ Informat Technol, Sch software engn, Chengdu 610225, Peoples R China

[3] Univ Claude Bernard Lyon 1, Univ Lyon, Univ Jean Monnet St Etienne, INSA Lyon,Univ Lyon 2,DISP UR4570, F-42300 Roanne, France

[4] Qatar Univ, Coll Engn, CSE, 2713, Doha, Qatar

来源：

INFORMATION SYSTEMS | 2023年 / 111卷

关键词：

Missing values; Imputation; Random forest; Mixed-type; Imbalanced; MISSING VALUE IMPUTATION; DECISION TREES; DATA SETS; SMOTE;

D O I：

10.1016/j.is.2022.102122

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As the application of information technology in the medical field is resulting in a large amount of medical data. As early withdrawal and refusal of participants, there are a lot of missing values in medical data. Although various processing methods for missing values have been proposed, few methods for those medical data with characteristics of imbalance and mixed-type data. In this work, we proposed an adaptive Laplacian weight random forest, called ALWRF. In ALWRF, feature weights were adjusted dynamically when model constructing, which increases selection probabilities of features with low Laplacian score and high importance. Meanwhile, a random operator is introduced to increase the diversity of trees. Furthermore, we proposed an imputation method based on SMOTE -NC oversampling technology and the ALWRF method for imbalanced and mixed-type data, called SncALWRFI. Meanwhile, Bayesian optimization and cross-validation were employed to search optimal parameters. The experimental results showed that the ALWRF method outperforms random forest and Bayesian optimized random forest in terms of classification and regression accuracy. Further, in the experiment for missing values, the SncALWRFI showed the best imputation accuracy, and it performed high imputation effectiveness in public datasets with characteristics of imbalanced and mixed-type.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：17

共 50 条

[1] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
Rabea Aschenbruck
Gero Szepannek
Adalbert F. X. Wilhelm
Journal of Classification, 2023, 40 : 2 - 24
[2] Imputation Strategies for Clustering Mixed-Type Data with Missing Values
Aschenbruck, Rabea
Szepannek, Gero
Wilhelm, Adalbert F. X.
JOURNAL OF CLASSIFICATION, 2023, 40 (01) : 2 - 24
[3] High-dimensional large-scale mixed-type data imputation under missing at random
Liu, Wei
Li, Guizhen
Zhou, Ling
Luo, Lan
SCIENCE CHINA-MATHEMATICS, 2025, 68 (04) : 969 - 1000
[4] High-dimensional large-scale mixed-type data imputation under missing at random
Wei Liu
Guizhen Li
Ling Zhou
Lan Luo
Science China(Mathematics), 2025, 68 (04) : 969 - 1000
[5] Random Forest Modelling of High-Dimensional Mixed-Type Data for Breast Cancer Classification
Quist, Jelmar
Taylor, Lawson
Staaf, Johan
Grigoriadis, Anita
CANCERS, 2021, 13 (05) : 1 - 15
[6] MissForest-non-parametric missing value imputation for mixed-type data
Stekhoven, Daniel J.
Buehlmann, Peter
BIOINFORMATICS, 2012, 28 (01) : 112 - 118
[7] Mixed-Type Imputation for Missing Data Credal Classification via Quality Matrices
Zhang, Zuowei
Liu, Zhunga
Tian, Hongpeng
Martin, Arnaud
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (08): : 4772 - 4785
[8] Data depth for mixed-type data through MDS. An application to biological age imputation
Cascos, Ignacio
Grane, Aurea
Qian, Jingye
SOCIO-ECONOMIC PLANNING SCIENCES, 2025, 98
[9] Review and evaluation of imputation methods for multivariate longitudinal data with mixed-type incomplete variables
Cao, Yi
Allore, Heather
Vander Wyk, Brent
Gutman, Roee
STATISTICS IN MEDICINE, 2022, 41 (30) : 5844 - 5876
[10] Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach
Erica Tavazzi
Sebastian Daberdaku
Rosario Vasta
Andrea Calvo
Adriano Chiò
Barbara Di Camillo
BMC Medical Informatics and Decision Making, 20

← 1 2 3 4 5 →