Knowledge discovery from noisy imbalanced and incomplete binary class data

被引:25
|
作者
Puri, Arjun [1 ]
Gupta, Manoj Kumar [1 ]
机构
[1] Shri Mata Vaishno Devi Univ, Comp Sci & Engn, Katra 182320, Jammu & Kashmir, India
关键词
Missing value imputation techniques; Oversampling techniques; Noise; Binary class imbalanced data; Performance metrics; GENETIC ALGORITHM; CLASSIFICATION; IMPUTATION; FRAUD; SMOTE;
D O I
10.1016/j.eswa.2021.115179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance creates a considerable impact on the classification of instances using traditional classifiers. Class imbalance, along with other difficulties, creates a significant impact on recognizing instances of minority class. Researchers work in various directions to mitigate class imbalance effect along with noise as well as missing values in datasets. However, combined studies of noisy class imbalance along with incomplete datasets have not been performed yet. This article contains a detailed analysis of 84 different machine learning models to deal with noisy binary class imbalanced and incomplete data using AUC, G-Mean, and F1-score as performance metrics. This article contains a detailed experiment considering missing value imputation and oversampling techniques. The article contains three comparisons: first missing value imputation techniques in incomplete and binary class imbalanced data, second, resampling techniques in noisy binary class imbalanced data, and third, combined techniques in noisy binary class imbalanced and incomplete data. We conclude that MICE and KNN techniques perform well with an increase in the imbalanced dataset's missing value from the first comparison. In second comparison, the SMOTE-ENN technique performs better than state-of-art in noisy binary class imbalanced datasets, and in the third comparison, we conclude that MICE with SMOTE-ENN technique perform well compared to the rest of the techniques.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Knowledge discovery from data streams
    Gama, Joao
    Aguilar-Ruiz, Jesus
    Klinkenberg, Ralf
    INTELLIGENT DATA ANALYSIS, 2008, 12 (03) : 251 - 252
  • [43] Knowledge discovery from data streams
    Gama, Joao
    Aguilar-Ruiz, Jesus
    INTELLIGENT DATA ANALYSIS, 2007, 11 (01) : 1 - 2
  • [44] Knowledge Discovery from Data Mining
    Lan, Tian
    EBM 2010: INTERNATIONAL CONFERENCE ON ENGINEERING AND BUSINESS MANAGEMENT, VOLS 1-8, 2010, : 4642 - 4645
  • [45] Knowledge discovery from numerical data
    Morita, C
    Tsukimoto, H
    KNOWLEDGE-BASED SYSTEMS, 1998, 10 (07) : 413 - 419
  • [46] Mixtures of regression models with incomplete and noisy data
    Jung, Byoung Cheol
    Cheon, Sooyoung
    Lim, Hwa Kyung
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2018, 47 (02) : 444 - 463
  • [47] Unsupervised record matching with noisy and incomplete data
    van Gennip Y.
    Hunter B.
    Ma A.
    Moyer D.
    de Vera R.
    Bertozzi A.L.
    van Gennip, Yves (y.vangennip@nottingham.ac.uk), 2018, Springer Science and Business Media Deutschland GmbH (06) : 109 - 129
  • [48] Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise
    Ronaldo C. Prati
    Julián Luengo
    Francisco Herrera
    Knowledge and Information Systems, 2019, 60 : 63 - 97
  • [49] Robust Sparse Representation for Incomplete and Noisy Data
    Shi, Jiarong
    Zheng, Xiuyun
    Yang, Wei
    INFORMATION, 2015, 6 (03): : 287 - 299
  • [50] Incomplete and noisy network data as a percolation process
    Stumpf, Michael P. H.
    Wiuf, Carsten
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2010, 7 (51) : 1411 - 1419