Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning

被引:2
|
作者
Li, Jiaxi [1 ]
Wang, Zhelong [1 ]
Wu, Lina [2 ]
Qiu, Sen [1 ]
Zhao, Hongyu [1 ]
Lin, Fang [1 ]
Zhang, Ke [1 ]
机构
[1] Dalian Univ Technol, Sch Control Sci & Engn, Dalian 116024, Peoples R China
[2] Liaoning Canc Hosp & Inst, Shenyang 110042, Peoples R China
关键词
Training; Mathematical models; Ensemble learning; Task analysis; Costs; Support vector machines; Data models; Data incompleteness; class imbalance; physical fitness assessment; malignant tumor patients; multivariate imputation by chained equations; ensemble learning; MISSING DATA IMPUTATION; MULTIPLE IMPUTATION; PREHABILITATION; FRAMEWORK; HEALTH;
D O I
10.1109/JBHI.2024.3376428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.
引用
收藏
页码:3102 / 3113
页数:12
相关论文
共 50 条
  • [22] An Imputation-Based Method for Fuzzy Clustering of Incomplete Data
    Soni, S.
    Sharma, I.
    2017 INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2017, : 616 - 621
  • [23] Spark-based ensemble learning for imbalanced data classification
    Ding J.
    Wang S.
    Jia L.
    You J.
    Jiang Y.
    International Journal of Performability Engineering, 2018, 14 (05) : 945 - 964
  • [24] Multiple imputation by chained equations for systematically and sporadically missing multilevel data
    Resche-Rigon, Matthieu
    White, Ian R.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (06) : 1634 - 1649
  • [25] Robustness of a multivariate normal approximation for imputation of incomplete binary data
    Bernaards, Coen A.
    Belin, Thomas R.
    Schafer, Joseph L.
    STATISTICS IN MEDICINE, 2007, 26 (06) : 1368 - 1382
  • [26] Ranking contributors to traffic crashes on mountainous freeways from an incomplete dataset: A sequential approach of multivariate imputation by chained equations and random forest classifier
    Li, Linchao
    Prato, Carlo G.
    Wang, Yonggang
    ACCIDENT ANALYSIS AND PREVENTION, 2020, 146 (146):
  • [27] EMRIL: Ensemble Method based on ReInforcement Learning for binary classification in imbalanced drifting data streams
    Usman, Muhammad
    Chen, Huanhuan
    NEUROCOMPUTING, 2024, 605
  • [28] Fuzzy-Based Information Decomposition for Incomplete and Imbalanced Data Learning
    Liu, Shigang
    Zhang, Jun
    Xiang, Yang
    Zhou, Wanlei
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (06) : 1476 - 1490
  • [29] A new imputation method for incomplete binary data
    Subasi, Munevver Mine
    Subasi, Ersoy
    Anthony, Martin
    Hammer, Peter L.
    DISCRETE APPLIED MATHEMATICS, 2011, 159 (10) : 1040 - 1047
  • [30] A Robust Enhanced Ensemble Learning Method for Breast Cancer Data Diagnosis on Imbalanced Data
    Wang, Zhenzhen
    Xie, Junde
    Zhang, Jia
    IEEE ACCESS, 2024, 12 : 189776 - 189788