Knowledge discovery from noisy imbalanced and incomplete binary class data

被引:25
|
作者
Puri, Arjun [1 ]
Gupta, Manoj Kumar [1 ]
机构
[1] Shri Mata Vaishno Devi Univ, Comp Sci & Engn, Katra 182320, Jammu & Kashmir, India
关键词
Missing value imputation techniques; Oversampling techniques; Noise; Binary class imbalanced data; Performance metrics; GENETIC ALGORITHM; CLASSIFICATION; IMPUTATION; FRAUD; SMOTE;
D O I
10.1016/j.eswa.2021.115179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance creates a considerable impact on the classification of instances using traditional classifiers. Class imbalance, along with other difficulties, creates a significant impact on recognizing instances of minority class. Researchers work in various directions to mitigate class imbalance effect along with noise as well as missing values in datasets. However, combined studies of noisy class imbalance along with incomplete datasets have not been performed yet. This article contains a detailed analysis of 84 different machine learning models to deal with noisy binary class imbalanced and incomplete data using AUC, G-Mean, and F1-score as performance metrics. This article contains a detailed experiment considering missing value imputation and oversampling techniques. The article contains three comparisons: first missing value imputation techniques in incomplete and binary class imbalanced data, second, resampling techniques in noisy binary class imbalanced data, and third, combined techniques in noisy binary class imbalanced and incomplete data. We conclude that MICE and KNN techniques perform well with an increase in the imbalanced dataset's missing value from the first comparison. In second comparison, the SMOTE-ENN technique performs better than state-of-art in noisy binary class imbalanced datasets, and in the third comparison, we conclude that MICE with SMOTE-ENN technique perform well compared to the rest of the techniques.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] Local ensemble learning from imbalanced and noisy data for word sense disambiguation
    Krawczyk, Bartosz
    McInnes, Bridget T.
    PATTERN RECOGNITION, 2018, 78 : 103 - 119
  • [32] Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement
    Zhou, Pei-Yuan
    Wong, Andrew K. C.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2021, 21 (01)
  • [33] Explanation and prediction of clinical data with imbalanced class distribution based on pattern discovery and disentanglement
    Pei-Yuan Zhou
    Andrew K. C. Wong
    BMC Medical Informatics and Decision Making, 21
  • [34] Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2011, 41 (03): : 552 - 568
  • [35] Robust Thresholding Strategies for Highly Imbalanced and Noisy Data
    Johnson, Justin M.
    Khoshgoftaar, Taghi M.
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1182 - 1188
  • [36] Structural damage detection from incomplete and noisy modal test data
    Law, SS
    Shi, ZY
    Zhang, LM
    JOURNAL OF ENGINEERING MECHANICS-ASCE, 1998, 124 (11): : 1280 - 1288
  • [37] A Novel NMF Guided for Hyperspectral Unmixing From Incomplete and Noisy Data
    Dong, Le
    Lu, Xiaoqiang
    Liu, Ganchao
    Yuan, Yuan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [38] Autonomous inference of complex network dynamics from incomplete and noisy data
    Gao, Ting-Ting
    Yan, Gang
    NATURE COMPUTATIONAL SCIENCE, 2022, 2 (03): : 160 - 168
  • [39] Structural damage detection from incomplete and noisy modal test data
    Law, S.S.
    Shi, Z.Y.
    Zhang, L.M.
    Journal of Engineering Mechanics, 1998, 124 (11): : 1280 - 1288
  • [40] Autonomous inference of complex network dynamics from incomplete and noisy data
    Ting-Ting Gao
    Gang Yan
    Nature Computational Science, 2022, 2 : 160 - 168