Classification of imbalanced medical data: An empirical study of machine learning approaches

被引:3
|
作者
Mundra, Shikha [1 ]
Vijay, Shounak [1 ]
Mundra, Ankit [2 ]
Gupta, Punit [3 ]
Goyal, Mayank Kumar [4 ]
Kaur, Mandeep [4 ]
Khaitan, Supriya [5 ]
Rajpoot, Abha Kiran [4 ]
机构
[1] Manipal Univ Jaipur, Dept Comp Sci & Engn, Jaipur, Rajasthan, India
[2] Manipal Univ Jaipur, Dept Informat Technol, Jaipur, Rajasthan, India
[3] Manipal Univ Jaipur, Dept Comp & Commun Engn, Jaipur, Rajasthan, India
[4] Sharda Univ, Sch Engn & Technol, Dept Comp Sci & Engn, Greater Noida, India
[5] Pillai Coll Engn, Dept Comp Engn, Navi Mumbai, India
关键词
Synthetic Minority Oversampler (SMOTE); Random Oversampler (ROS); Adaptive Synthetic Sampler (ADASYN); Random Undersampler (RUS); near miss; Tomek Link (TL); One Sided Selection (OSS); Edited Nearest Neighbors (ENN); CHALLENGES; REGRESSION; SMOTE;
D O I
10.3233/JIFS-219294
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thousands of patients around the world affecting their health with various factor as age, body mass index, cholesterol levels, albumin levels and several other factor. Prediction of health outcome due to these factors at a proper time can be served as an early warning. Recent growth in machine learning algorithm inspired us to build a predictive model for better healthcare facilities. In our work we have focused on problem of noisy and imbalanced dataset in which majority class is favored over minority one that leads to false prediction. We have experimented with two publicly available medical imbalanced dataset which varies in its size as MIT's GOSSIS death and PIMA Indians Diabetes Dataset based on binary class. In this model we have investigated 3 oversampling techniques (Synthetic Minority Oversampler, Random Oversampler and Adaptive Synthetic Sampler) along with two undersampling techniques (Random Undersampler and Near Miss) which were paired with 3 data reduction and cleaning methods namely Tomek Links, One Sided Selection and Edited Nearest Neighbors. At last, we found that combination of Adaptive Synthetic Sampler along with One Sided Selection perform better in case of large size dataset while combination of random oversampler along with Tomek Link showed better performance in case of low size data dataset. We have also analyzed that oversampling technique gives quite promising results in comparison to undersampling methods specifically when applied with machine learning classifiers as these classifiers are data hungry algorithms.
引用
收藏
页码:1933 / 1946
页数:14
相关论文
共 50 条
  • [1] Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning
    Tyagi, Shivani
    Mittal, Sangeeta
    [J]. PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019, 2020, 597 : 209 - 221
  • [2] Adversarial Approaches to Tackle Imbalanced Data in Machine Learning
    Ayoub, Shahnawaz
    Gulzar, Yonis
    Rustamov, Jaloliddin
    Jabbari, Abdoh
    Reegu, Faheem Ahmad
    Turaev, Sherzod
    [J]. SUSTAINABILITY, 2023, 15 (09)
  • [3] An Improved Extreme Learning Machine for Imbalanced Data Classification
    Zhang, Xiaopeng
    Qin, Liangxi
    [J]. IEEE ACCESS, 2022, 10 : 8634 - 8642
  • [4] A machine learning method for incomplete and imbalanced medical data
    Salman, Issam
    Vomlel, Jiri
    [J]. PROCEEDINGS OF THE 20TH CZECH-JAPAN SEMINAR ON DATA ANALYSIS AND DECISION MAKING UNDER UNCERTAINTY, 2017, : 188 - 195
  • [5] Empirical Study of Online News Classification Using Machine Learning Approaches
    Suleymanov, Umid
    Rustamov, Samir
    Zulfugarov, Murad
    Orujov, Orkhan
    Musayev, Nadir
    Alizade, Azar
    [J]. 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 152 - 157
  • [6] IMBALANCED DATA CLASSIFICATION BASED ON EXTREME LEARNING MACHINE AUTOENCODER
    Shen, Chu
    Zhang, Su-Fang
    Zhai, Jun-Hal
    Luo, Ding-Sheng
    Chen, Jun-Fen
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2018, : 399 - 404
  • [7] An improved weighted extreme learning machine for imbalanced data classification
    Lu, Chengbo
    Ke, Haifeng
    Zhang, Gaoyan
    Mei, Ying
    Xu, Huihui
    [J]. MEMETIC COMPUTING, 2019, 11 (01) : 27 - 34
  • [8] An improved weighted extreme learning machine for imbalanced data classification
    Chengbo Lu
    Haifeng Ke
    Gaoyan Zhang
    Ying Mei
    Huihui Xu
    [J]. Memetic Computing, 2019, 11 : 27 - 34
  • [9] Empirical Assessment of Ensemble based Approaches to Classify Imbalanced Data in Binary Classification
    Kaur, Prabhjot
    Gosain, Anjana
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (03) : 48 - 58
  • [10] Dual weighted extreme learning machine for imbalanced data stream classification
    Zhang, Yong
    Liu, Wenzhe
    Ren, Xuezhen
    Ren, Yonggong
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 33 (02) : 1143 - 1154