Over-sampling algorithm for imbalanced data classification

被引:0
|
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
  • [31] An Over-sampling Method Based on Probability Density Estimation for Imbalanced Datasets Classification
    Cao, Lu
    Zhai, Yi-Kui
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP'16), 2016,
  • [32] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Huang, Zhaoke
    Yang, Chunhua
    Chen, Xiaofang
    Huang, Keke
    Xie, Yongfang
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 7183 - 7199
  • [33] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Zhaoke Huang
    Chunhua Yang
    Xiaofang Chen
    Keke Huang
    Yongfang Xie
    Neural Computing and Applications, 2020, 32 : 7183 - 7199
  • [34] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [35] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [36] Deep convolutional neural networks with genetic algorithm-based synthetic minority over-sampling technique for improved imbalanced data classification
    Alex, Suja A.
    Nayahi, J. Jesu Vedha
    Kaddoura, Sanaa
    APPLIED SOFT COMPUTING, 2024, 156
  • [37] A sparrow search algorithm-optimized convolutional neural network for imbalanced data classification using synthetic minority over-sampling technique
    Deng, Wu
    He, Qi
    Zhou, Xiangbing
    Chen, Huayue
    Zhao, Huimin
    PHYSICA SCRIPTA, 2023, 98 (11)
  • [38] A Novel Cluster based Over-sampling Approach for Classifying Imbalanced Sentiment Data
    Chang, Jing-Rong
    Chen, Long-Sheng
    Lin, Li-Wei
    IAENG International Journal of Computer Science, 2021, 48 (04):
  • [39] Borderline over-sampling in feature space for learning algorithms in imbalanced data environments
    Savetratanakaree, Kittipat (kittipatsavet@gmail.com), 1600, International Association of Engineers (43):
  • [40] Dynamic Synthetic Minority Over-Sampling Technique-Based Rotation Forest for the Classification of Imbalanced Hyperspectral Data
    Feng, Wei
    Dauphin, Gabriel
    Huang, Wenjiang
    Quan, Yinghui
    Bao, Wenxing
    Wu, Mingquan
    Li, Qiang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (07) : 2159 - 2169