Affine combination-based over-sampling for imbalanced regression

被引:0
|
作者
Li, Zhen-Zhen [1 ,2 ]
Huang, Niu [1 ,2 ]
Yi, Lun-Zhao [3 ]
Fu, Guang-Hui [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Sch Sci, 727 Jingming South Rd, Kunming 650500, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Ctr Appl Stat, Kunming 650500, Yunnan, Peoples R China
[3] Kunming Univ Sci & Technol, Fac Food Sci & Engn, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
affine combination; imbalanced data; imbalanced regression; over-sampling; CLASSIFICATION; CLASSIFIERS; CHALLENGES; MODELS; SMOTE; READY;
D O I
10.1002/cem.3537
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced domain prediction analysis is currently one of the hot research topics. Many real-world data mining analyses involve using imbalanced data to obtain predictive models. In the context of imbalance, research on classification problems has been extensive, but research on regression problems is negligible. Rare values rarely occur in imbalanced regression problems, but the focus is on accurately predicting the continuous target variables of rare instances. One of the challenges in imbalanced regression is finding a suitable strategy to rebalance the original dataset in order to improve the predictive performance of the model in rare instances. In this study, two algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS). ACOS rebalances the original dataset by generating new instances through the affine combinations of the original examples. The region where the new instances are generated can be adjusted based on the distribution of the data, ensuring that the generated cases better mimic the distribution of the original examples. The comparison among ACOS, SNOCCR, and other preprocessing methods was conducted on 15 datasets to validate the predictive performance of models trained on rebalanced datasets for rare instances. The experimental results indicate that ACOS outperforms other existing methods. In this study, two over-sampling algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS) in order to improve the predictive performance of the model in imbalanced regression. The experimental results indicate that ACOS outperforms other existing methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] GOS-IL: A Generalized Over-Sampling Based Online Imbalanced Learning Framework
    Barua, Sukarna
    Islam, Md. Monirul
    Murase, Kazuyuki
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 680 - 687
  • [32] An adaptive over-sampling method for imbalanced data based on simultaneous clustering and filtering noisy
    Chen, Wei
    Guo, Wenjie
    Mao, Weijie
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11430 - 11449
  • [33] A self-adaptive synthetic over-sampling technique for imbalanced classification
    Gu, Xiaowei
    Angelov, Plamen P.
    Soares, Eduardo A.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (06) : 923 - 943
  • [34] Diversity and Separable Metrics in Over-Sampling Technique for Imbalanced Data Classification
    Mahmoudi, Shadi
    Moradi, Parham
    Akhlaghian, Fardin
    Moradi, Rizan
    2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 152 - 158
  • [35] AMDO: An Over-Sampling Technique for Multi-Class Imbalanced Problems
    Yang, Xuebing
    Kuang, Qiuming
    Zhang, Wensheng
    Zhang, Guoping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (09) : 1672 - 1685
  • [36] Clustering boundary over-sampling classification method for imbalanced data sets
    Lou, Xiao-Jun
    Sun, Yu-Xuan
    Liu, Hai-Tao
    Liu, H.-T. (liuhaitao@wsn.cn), 1600, Zhejiang University (47): : 944 - 950
  • [37] Learning from Imbalanced Data Using Over-Sampling and the Firefly Algorithm
    Czarnowski, Ireneusz
    COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 12876 : 373 - 386
  • [38] Enriched Over-Sampling Techniques for Improving Classification of Imbalanced Big Data
    Patil, Sachin Subhash
    Sonavane, Shefali Pratap
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 1 - 10
  • [39] An over-sampling expert system for learning from imbalanced data sets
    He, GX
    Han, H
    Wang, WY
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 537 - 541
  • [40] RWO-Sampling: A random walk over-sampling approach to imbalanced data classification
    Zhang, Huaxiang
    Li, Mingfang
    INFORMATION FUSION, 2014, 20 : 99 - 116