Kernel density estimation based sampling for imbalanced class distribution

被引:69
|
作者
Kamalov, Firuz [1 ]
机构
[1] Canadian Univ Dubai, Dept Elect Engn, Dubai, U Arab Emirates
关键词
Kernel; KDE; Imbalanced data; Class imbalance; Sampling; Oversampling; FEATURE-SELECTION; CHALLENGES; SMOTE;
D O I
10.1016/j.ins.2019.10.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced response variable distribution is a common occurrence in data science. In fields such as fraud detection, medical diagnostics, system intrusion detection and many others where abnormal behavior is rarely observed the data under study often features disproportionate target class distribution. One common way to combat class imbalance is through resampling of the minority class to achieve a more balanced distribution. In this paper, we investigate the performance of the sampling method based on kernel density estimation (KDE). We believe that KDE offers a more natural way to generate new instances of minority class that is less prone to overfitting than other standard sampling techniques. It is based on a well established theory of nonparametric statistical estimation. Numerical experiments show that KDE can outperform other sampling techniques on a range of real life datasets as measured by F1-score and G-mean. The results remain consistent across a number of classification algorithms used in the experiments. Furthermore, the proposed method outperforms the benchmark methods irregardless of the class distribution ratio. We conclude, based on the solid theoretical foundation and strong experimental results, that the proposed method would be a valuable tool in problems involving imbalanced class distribution. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:1192 / 1201
页数:10
相关论文
共 50 条
  • [1] KDOS: Kernel Density based Over Sampling: - A Solution to Skewed Class Distribution
    Gillala, Rekha
    Reddy, V. Krishna
    Tyagi, Amit Kumar
    [J]. JOURNAL OF INFORMATION ASSURANCE AND SECURITY, 2020, 15 (02): : 44 - 52
  • [2] A Kernel Density Estimation-Based Variation Sampling for Class Imbalance in Defect Prediction
    Zhang, Yuqing
    Yan, Xuefeng
    Khan, Arif Ali
    [J]. 2020 IEEE INTL SYMP ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, INTL CONF ON BIG DATA & CLOUD COMPUTING, INTL SYMP SOCIAL COMPUTING & NETWORKING, INTL CONF ON SUSTAINABLE COMPUTING & COMMUNICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2020), 2020, : 1058 - 1065
  • [3] Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems
    Gao, Ming
    Hong, Xia
    Chen, Sheng
    Harris, Chris J.
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [4] Adversarial Kernel Sampling on Class-imbalanced Data Streams
    Yang, Peng
    Li, Ping
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2352 - 2362
  • [5] Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement
    Sayed A. Mostafa
    Ibrahim A. Ahmad
    [J]. Sankhya B, 2021, 83 : 507 - 547
  • [6] Kernel Density Estimation Based on the Distinct Units in Sampling with Replacement
    Mostafa, Sayed A.
    Ahmad, Ibrahim A.
    [J]. SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2021, 83 (02): : 507 - 547
  • [7] On kernel density estimation based on different stratified sampling with optimal allocation
    Samawi, Hani
    Chatterjee, Arpita
    Yin, JingJing
    Rochani, Haresh
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (22) : 10973 - 10990
  • [8] Imbalanced ELM Based on Normal Density Estimation for Binary-Class Classification
    He, Yulin
    Ashfaq, Rana Aamir Raza
    Huang, Joshua Zhexue
    Wang, Xizhao
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD 2016), 2016, 9794 : 48 - 60
  • [9] An Over-sampling Method Based on Probability Density Estimation for Imbalanced Datasets Classification
    Cao, Lu
    Zhai, Yi-Kui
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [10] SCD:Sampling-based Class Distribution for Imbalanced Semi-Supervised Learning
    Qiu, Haomiao
    Liu, Haixing
    Zhang, Chi
    [J]. 2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 567 - 572