Effects of single and multiple imputation strategies on addressing over-fitting issues caused by imbalanced data from various scenarios

被引:0
|
作者
Jiaxi Yang
Yihan Wang
Ye Yang
Kai Ding
Chongning Na
Yao Yang
机构
[1] Zhejiang Lab,College of Control Science and Engineering
[2] Zhejiang University,undefined
来源
Applied Intelligence | 2024年 / 54卷
关键词
Missing imputation; Imbalanced data; Simulation study;
D O I
暂无
中图分类号
学科分类号
摘要
The presence of missing values consistently emerges as a critical issue in most machine learning tasks, as they can alter the distribution of the training data and consequently lead to overfitting. The theoretical framework for missing value imputation has reached a considerable level of maturity, with numerous imputation models having been proposed. However, there has been limited research conducted on the underlying causes of missing values and scenarios where imbalanced data is significantly correlated with target variables due to business logic. In this study, we conducted simulation studies to evaluate the imputation performance of six imputation models on six datasets under three missing mechanisms, including random dropout, imbalance dropout based on features, and imbalance dropout based on labels, to identify an appropriate approach to deal with imbalanced missing data with certain patterns. By recognizing the missing pattern and imputing the data with a suitable imputation method, the overfitting issue caused by missingness has been significantly mitigated in a real-world application.
引用
收藏
页码:2812 / 2830
页数:18
相关论文
共 1 条
  • [1] Effects of single and multiple imputation strategies on addressing over-fitting issues caused by imbalanced data from various scenarios
    Yang, Jiaxi
    Wang, Yihan
    Yang, Ye
    Ding, Kai
    Na, Chongning
    Yang, Yao
    [J]. APPLIED INTELLIGENCE, 2024, 54 (03) : 2812 - 2830