Over-sampling algorithm for imbalanced data classification

被引:0
|
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
  • [41] Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets
    Rivera, William A.
    INFORMATION SCIENCES, 2017, 408 : 146 - 161
  • [42] A novel clustering-based over-sampling technique for imbalanced data sets
    Mirzaei, Behzad
    Nezamabadi-pour, Hossein
    Mahmoodi, Javad
    2024 32ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, ICEE 2024, 2024, : 662 - 668
  • [43] SROT: Sparse representation-based over-sampling technique for classification of imbalanced dataset
    Zou, Xionggao
    Feng, Yueping
    Li, Huiying
    Jiang, Shuyu
    2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, ENERGY TECHNOLOGY AND ENVIRONMENTAL ENGINEERING (MSETEE 2017), 2017, 81
  • [44] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Hao Ding
    Bin Wei
    Zhaorui Gu
    Zhibin Yu
    Haiyong Zheng
    Bing Zheng
    Juan Li
    Multimedia Tools and Applications, 2020, 79 : 14871 - 14888
  • [45] KA-Ensemble: towards imbalanced image classification ensembling under-sampling and over-sampling
    Ding, Hao
    Wei, Bin
    Gu, Zhaorui
    Yu, Zhibin
    Zheng, Haiyong
    Zheng, Bing
    Li, Juan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (21-22) : 14871 - 14888
  • [46] An Over-Sampling Technique with Rejection for Imbalanced Class Learning
    Lee, Jaedong
    Kim, Noo-ri
    Lee, Jee-Hyong
    ACM IMCOM 2015, PROCEEDINGS, 2015,
  • [47] Over-sampling imbalanced datasets using the covariance matrix
    Leguen-de Varona, Ireimis
    Madera, Julio
    Martínez-López, Yoan
    Hernández-Nieto, José Carlos
    EAI Endorsed Transactions on Energy Web, 2020, 7 (27) : 1 - 6
  • [48] Unbalanced data classification based on over-sampling and integrated learning
    Zhang, Yongjun
    Jian, Xiaowen
    2021 ASIA-PACIFIC CONFERENCE ON COMMUNICATIONS TECHNOLOGY AND COMPUTER SCIENCE (ACCTCS 2021), 2021, : 332 - 337
  • [49] Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning
    Han, H
    Wang, WY
    Mao, BH
    ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 : 878 - 887
  • [50] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,