Effects of single and multiple imputation strategies on addressing over-fitting issues caused by imbalanced data from various scenarios

被引:0
|
作者
Yang, Jiaxi [1 ]
Wang, Yihan [1 ]
Yang, Ye [2 ]
Ding, Kai [1 ]
Na, Chongning [1 ]
Yang, Yao [1 ]
机构
[1] Zhejiang Lab, Hangzhou 311100, Zhejiang, Peoples R China
[2] Zhejiang Univ, Coll Control Sci & Engn, Hangzhou 310058, Zhejiang, Peoples R China
关键词
Missing imputation; Imbalanced data; Simulation study; MISSING DATA IMPUTATION;
D O I
10.1007/s10489-024-05295-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of missing values consistently emerges as a critical issue in most machine learning tasks, as they can alter the distribution of the training data and consequently lead to overfitting. The theoretical framework for missing value imputation has reached a considerable level of maturity, with numerous imputation models having been proposed. However, there has been limited research conducted on the underlying causes of missing values and scenarios where imbalanced data is significantly correlated with target variables due to business logic. In this study, we conducted simulation studies to evaluate the imputation performance of six imputation models on six datasets under three missing mechanisms, including random dropout, imbalance dropout based on features, and imbalance dropout based on labels, to identify an appropriate approach to deal with imbalanced missing data with certain patterns. By recognizing the missing pattern and imputing the data with a suitable imputation method, the overfitting issue caused by missingness has been significantly mitigated in a real-world application.
引用
收藏
页码:2812 / 2830
页数:19
相关论文
共 1 条
  • [1] Effects of single and multiple imputation strategies on addressing over-fitting issues caused by imbalanced data from various scenarios
    Jiaxi Yang
    Yihan Wang
    Ye Yang
    Kai Ding
    Chongning Na
    Yao Yang
    [J]. Applied Intelligence, 2024, 54 : 2812 - 2830