Knowledge discovery from noisy imbalanced and incomplete binary class data

被引:25
|
作者
Puri, Arjun [1 ]
Gupta, Manoj Kumar [1 ]
机构
[1] Shri Mata Vaishno Devi Univ, Comp Sci & Engn, Katra 182320, Jammu & Kashmir, India
关键词
Missing value imputation techniques; Oversampling techniques; Noise; Binary class imbalanced data; Performance metrics; GENETIC ALGORITHM; CLASSIFICATION; IMPUTATION; FRAUD; SMOTE;
D O I
10.1016/j.eswa.2021.115179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance creates a considerable impact on the classification of instances using traditional classifiers. Class imbalance, along with other difficulties, creates a significant impact on recognizing instances of minority class. Researchers work in various directions to mitigate class imbalance effect along with noise as well as missing values in datasets. However, combined studies of noisy class imbalance along with incomplete datasets have not been performed yet. This article contains a detailed analysis of 84 different machine learning models to deal with noisy binary class imbalanced and incomplete data using AUC, G-Mean, and F1-score as performance metrics. This article contains a detailed experiment considering missing value imputation and oversampling techniques. The article contains three comparisons: first missing value imputation techniques in incomplete and binary class imbalanced data, second, resampling techniques in noisy binary class imbalanced data, and third, combined techniques in noisy binary class imbalanced and incomplete data. We conclude that MICE and KNN techniques perform well with an increase in the imbalanced dataset's missing value from the first comparison. In second comparison, the SMOTE-ENN technique performs better than state-of-art in noisy binary class imbalanced datasets, and in the third comparison, we conclude that MICE with SMOTE-ENN technique perform well compared to the rest of the techniques.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Knowledge discovery from imbalanced and noisy data
    Van Hulse, Jason
    Khoshgoftaar, Taghi
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (12) : 1513 - 1542
  • [2] Supervised knowledge discovery from incomplete data
    Kalousis, A
    Hilario, M
    DATA MINING II, 2000, 2 : 269 - 278
  • [3] Discovery of incomplete knowledge in electrocardiographic data
    Azuaje, FJ
    Dubitzky, W
    Lopes, P
    Black, ND
    Adamson, K
    Wu, X
    White, JA
    PROCEEDING OF THE THIRD INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND EXPERT SYSTEMS IN MEDICINE AND HEALTHCARE, 1998, : 286 - 294
  • [4] Hyperspectral Unmixing from Incomplete and Noisy Data
    Montag, Martin J.
    Stephani, Henrike
    JOURNAL OF IMAGING, 2016, 2 (01)
  • [5] Online Learning From Incomplete and Imbalanced Data Streams
    You, Dianlong
    Xiao, Jiawei
    Wang, Yang
    Yan, Huigui
    Wu, Di
    Chen, Zhen
    Shen, Limin
    Wu, Xindong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10650 - 10665
  • [6] Learning from Imbalanced Data in Presence of Noisy and Borderline Examples
    Napierala, Krystyna
    Stefanowski, Jerzy
    Wilk, Szymon
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2010, 6086 : 158 - 167
  • [7] A fuzzy classifier for imbalanced and noisy data
    Visa, S
    Ralescu, A
    2004 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, PROCEEDINGS, 2004, : 1727 - 1732
  • [8] Efficient Graph Learning From Noisy and Incomplete Data
    Berger, Peter
    Hannak, Gabor
    Matz, Gerald
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2020, 6 : 105 - 119
  • [9] IMAGE-RECONSTRUCTION FROM INCOMPLETE AND NOISY DATA
    GULL, SF
    DANIELL, GJ
    NATURE, 1978, 272 (5655) : 686 - 690
  • [10] Inferences from noisy and incomplete biological network data
    Stumpf, Michael P. H.
    RECENT PROGRESS IN COMPUTATIONAL SCIENCES AND ENGINEERING, VOLS 7A AND 7B, 2006, 7A-B : 764 - 767