Method for Incomplete and Imbalanced Data Based on Multivariate Imputation by Chained Equations and Ensemble Learning

被引:2
|
作者
Li, Jiaxi [1 ]
Wang, Zhelong [1 ]
Wu, Lina [2 ]
Qiu, Sen [1 ]
Zhao, Hongyu [1 ]
Lin, Fang [1 ]
Zhang, Ke [1 ]
机构
[1] Dalian Univ Technol, Sch Control Sci & Engn, Dalian 116024, Peoples R China
[2] Liaoning Canc Hosp & Inst, Shenyang 110042, Peoples R China
关键词
Training; Mathematical models; Ensemble learning; Task analysis; Costs; Support vector machines; Data models; Data incompleteness; class imbalance; physical fitness assessment; malignant tumor patients; multivariate imputation by chained equations; ensemble learning; MISSING DATA IMPUTATION; MULTIPLE IMPUTATION; PREHABILITATION; FRAMEWORK; HEALTH;
D O I
10.1109/JBHI.2024.3376428
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The classification analysis of incomplete and imbalanced data is still a challenging task since these issues could negatively impact the training of classifiers, which were also found in our study on the physical fitness assessments of patients. And in fields such as healthcare, there are higher requirements for the accuracy of the generated imputation values. To train a high-performance classifier and pursue high accuracy, we attempted to resolve any potential negative impact by using a novel algorithmic approach based on the combination of multivariate imputation by chained equations and the ensemble learning method (MICEEN), which can solve the two problems simultaneously. We used multivariate imputation by chained equations to generate more accurate imputation values for the training set passed to ensemble learning to build a predictor. On the other hand, missing values were introduced into minority classes and used them to generate new samples belonging to the minority classes in order to balance the distribution of classes. On real-world datasets, we perform extensive experiments to assess our method and compare it to other state-of-the-art approaches. The advantages of the proposed method are demonstrated by experimental results for the benchmark datasets and self-collected datasets of physical fitness assessment of tumor patients with varying missing rates.
引用
收藏
页码:3102 / 3113
页数:12
相关论文
共 50 条
  • [31] Ensemble based Data Imputation at the Edge
    Fountas, Panagiotis
    Kolomvatsos, Kostas
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 961 - 968
  • [32] A joint learning method for incomplete and imbalanced data in electronic health record based on generative adversarial networks
    Weng, Xutao
    Song, Hong
    Lin, Yucong
    Wu, You
    Zhang, Xi
    Liu, Bowen
    Yang, Jian
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 168
  • [33] Multicriteria Classifier Ensemble Learning for Imbalanced Data
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Micha
    Wegier, Weronika
    IEEE Access, 2022, 10 : 16807 - 16818
  • [34] An Improved Ensemble Learning for Imbalanced Data Classification
    Yuan, Zhengwu
    Zhao, Pu
    PROCEEDINGS OF 2019 IEEE 8TH JOINT INTERNATIONAL INFORMATION TECHNOLOGY AND ARTIFICIAL INTELLIGENCE CONFERENCE (ITAIC 2019), 2019, : 408 - 411
  • [35] Multicriteria Classifier Ensemble Learning for Imbalanced Data
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Micha
    IEEE ACCESS, 2022, 10 : 16807 - 16818
  • [36] Entropy-based hybrid sampling ensemble learning for imbalanced data
    Dongdong, Li
    Ziqiu, Chi
    Bolu, Wang
    Zhe, Wang
    Hai, Yang
    Wenli, Du
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (07) : 3039 - 3067
  • [37] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Chen, Zhi
    Lin, Tao
    Xia, Xin
    Xu, Hongyan
    Ding, Sha
    APPLIED INTELLIGENCE, 2018, 48 (08) : 2441 - 2457
  • [38] Using Graph-Based Ensemble Learning to Classify Imbalanced Data
    Qin, Anyong
    Shang, Zhaowei
    Tian, Jinyu
    Zhang, Taiping
    Wang, Yulong
    Tang, Yuan Yan
    2017 3RD IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS (CYBCONF), 2017, : 265 - 270
  • [39] A synthetic neighborhood generation based ensemble learning for the imbalanced data classification
    Zhi Chen
    Tao Lin
    Xin Xia
    Hongyan Xu
    Sha Ding
    Applied Intelligence, 2018, 48 : 2441 - 2457
  • [40] A Heterogeneous AdaBoost Ensemble Based Extreme Learning Machines for Imbalanced Data
    Abuassba, Adnan Omer
    Zhang, Dezheng
    Luo, Xiong
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2019, 13 (03) : 19 - 35