A data mining method for imbalanced datasets based on one-sided link and distribution density of instances

被引:0
|
作者
Zhai, Yun [1 ,2 ]
Wang, Shu-Peng [3 ]
Ma, Nan [4 ]
Yang, Bing-Ru [2 ]
Zhang, De-Zheng [2 ]
机构
[1] E-Government Research Center, Chinese Academy of Governance, Beijing,100089, China
[2] School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing,100083, China
[3] Institute of Information Engineering, Chinese Academy of Sciences, Beijing,100093, China
[4] College of Information Technology, Beijing Union University, Beijing,100101, China
来源
关键词
Data mining - Learning systems;
D O I
10.3969/j.issn.0372-2112.2014.07.011
中图分类号
学科分类号
摘要
Classification in imbalanced datasets poses a great challenge to machine learning region, where the synthetic minority over-sampling technique (SMOTE) has become a powerful means and widely adopted as an effective method. But in generating new instances, SMOTE uses all instances in minority class such that it takes with over-generalization. To better solve the problem, a data mining method for imbalanced datasets based on one-sided link and distribution density of the minority (OSLDD-SMOTE) is proposed in this paper. OSLDD-SMOTE firstly selects the minority near the classification boundary using the one-sided link, then generates new instances with SMOTE based on the dynamic distribution density of these instances. Effects of synthetic degree on new generated instances and accuracy of the minority are respectively compared with the OSLDD-SMOTE, SMOTE, Borderline-SMOTE and Surrounding-SMOTE method. Furthermore, from the simulation results with 8 UCI datasets, our proposed method has the most accurate and robust performance on the G-mean, F-measure and AUC metrics.
引用
收藏
页码:1311 / 1319
相关论文
共 50 条
  • [1] One-sided fuzzy SVM based on sphere for imbalanced data sets learning
    Han, Hui
    Mao, Binghuan
    Lv, Hairong
    Zhuo, Qing
    Wang, Wenyuan
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 166 - +
  • [2] ONE-SIDED AND 2-SIDED SAMPLING PLANS BASED ON THE EXPONENTIAL-DISTRIBUTION
    KOCHERLAKOTA, S
    BALAKRISHNAN, N
    NAVAL RESEARCH LOGISTICS, 1986, 33 (03) : 513 - 522
  • [3] A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
    Cao, Jie
    Shi, Yong
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2021, 28 (06): : 1813 - 1819
  • [4] ONE-SIDED TOLERANCE LIMITS FOR A LOGISTIC DISTRIBUTION BASED ON CENSORED SAMPLES
    HALL, IJ
    BIOMETRICS, 1975, 31 (04) : 873 - 879
  • [5] KINETICS OF THE CHANGE OF DENSITY DISTRIBUTION IN HOT ONE-SIDED PRESSING OF A VISCOUS POROUS BODY
    BUCHATSKII, LM
    STOLIN, AM
    KHUDYAEV, SI
    SOVIET POWDER METALLURGY AND METAL CERAMICS, 1986, 25 (09): : 733 - 737
  • [6] Similarity based one-sided tests for the expected value and interval data
    Grzegorzewski, Przemyslaw
    Ramos-Guajardo, Ana Belen
    PROCEEDINGS OF THE 2015 CONFERENCE OF THE INTERNATIONAL FUZZY SYSTEMS ASSOCIATION AND THE EUROPEAN SOCIETY FOR FUZZY LOGIC AND TECHNOLOGY, 2015, 89 : 960 - 966
  • [7] ASYMPTOTIC-DISTRIBUTION OF ONE-SIDED KOLMOGOROV-SMIRNOV STATISTIC FOR TRUNCATED DATA
    SCHEY, HM
    COMMUNICATIONS IN STATISTICS PART A-THEORY AND METHODS, 1977, 6 (14): : 1361 - 1366
  • [8] ONE-SIDED CONFIDENCE INTERVAL BASED ON A CENSORED SAMPLE FOR AN UNKNOWN DISTRIBUTION FUNCTION
    LAURENT, AG
    ANNALS OF MATHEMATICAL STATISTICS, 1970, 41 (01): : 331 - +
  • [9] ASYMPTOTIC-DISTRIBUTION OF ONE-SIDED KOLMOGOROV-SMIRNOV STATISTIC FOR TRUNCATED DATA
    SHEY, HM
    BIOMETRICS, 1977, 33 (03) : 583 - 584
  • [10] Observer-based stabilization of one-sided Lipschitz systems with application to flexible link manipulator
    Wu, Rui
    Zhang, Wei
    Song, Fang
    Wu, Zhiyang
    Guo, Wei
    ADVANCES IN MECHANICAL ENGINEERING, 2015, 7 (12)