Knowledge discovery from noisy imbalanced and incomplete binary class data

被引:25
|
作者
Puri, Arjun [1 ]
Gupta, Manoj Kumar [1 ]
机构
[1] Shri Mata Vaishno Devi Univ, Comp Sci & Engn, Katra 182320, Jammu & Kashmir, India
关键词
Missing value imputation techniques; Oversampling techniques; Noise; Binary class imbalanced data; Performance metrics; GENETIC ALGORITHM; CLASSIFICATION; IMPUTATION; FRAUD; SMOTE;
D O I
10.1016/j.eswa.2021.115179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance creates a considerable impact on the classification of instances using traditional classifiers. Class imbalance, along with other difficulties, creates a significant impact on recognizing instances of minority class. Researchers work in various directions to mitigate class imbalance effect along with noise as well as missing values in datasets. However, combined studies of noisy class imbalance along with incomplete datasets have not been performed yet. This article contains a detailed analysis of 84 different machine learning models to deal with noisy binary class imbalanced and incomplete data using AUC, G-Mean, and F1-score as performance metrics. This article contains a detailed experiment considering missing value imputation and oversampling techniques. The article contains three comparisons: first missing value imputation techniques in incomplete and binary class imbalanced data, second, resampling techniques in noisy binary class imbalanced data, and third, combined techniques in noisy binary class imbalanced and incomplete data. We conclude that MICE and KNN techniques perform well with an increase in the imbalanced dataset's missing value from the first comparison. In second comparison, the SMOTE-ENN technique performs better than state-of-art in noisy binary class imbalanced datasets, and in the third comparison, we conclude that MICE with SMOTE-ENN technique perform well compared to the rest of the techniques.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Knowledge discovery from data?
    Pazzani, Michael J.
    IEEE Intelligent Systems and Their Applications, 2000, 15 (02): : 10 - 13
  • [22] Clustering-based Binary-class Classification for Imbalanced Data Sets
    Chen, Chao
    Shyu, Mei-Ling
    2011 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2011, : 384 - 389
  • [23] Bayesian image and pattern reconstruction from incomplete and noisy data
    V. L. Vengrinovich
    Pattern Recognition and Image Analysis, 2012, 22 (1) : 99 - 107
  • [24] ISING FIELD PARAMETER ESTIMATION FROM INCOMPLETE AND NOISY DATA
    Giovannelli, J. -F.
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 1853 - 1856
  • [25] Learning of networked spreading models from noisy and incomplete data
    Wilinski, Mateusz
    Lokhov, Andrey Y.
    PHYSICAL REVIEW E, 2024, 110 (05)
  • [26] Methods of knowledge discovery in "noisy" databases
    Berisha, AM
    Vagin, VN
    Kulikov, AV
    Fomina, MV
    JOURNAL OF COMPUTER AND SYSTEMS SCIENCES INTERNATIONAL, 2005, 44 (06) : 973 - 987
  • [27] Problem of knowledge discovery in noisy databases
    Vagin, Vadim
    Fomina, Marina
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2011, 2 (03) : 135 - 145
  • [28] Problem of knowledge discovery in noisy databases
    Vadim Vagin
    Marina Fomina
    International Journal of Machine Learning and Cybernetics, 2011, 2 : 135 - 145
  • [29] Forecasting cyberattacks with incomplete, imbalanced, and insignificant data
    Okutan A.
    Werner G.
    Yang S.J.
    McConky K.
    Cybersecurity, 1 (1)
  • [30] (1+ε)-class Classification: an Anomaly Detection Method for Highly Imbalanced or Incomplete Data Sets
    Borisyak, Maxim
    Ryzhikov, Artem
    Ustyuzhanin, Andrey
    Derkach, Denis
    Ratnikov, Fedor
    Mineeva, Olga
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21