Affine combination-based over-sampling for imbalanced regression

被引:0
|
作者
Li, Zhen-Zhen [1 ,2 ]
Huang, Niu [1 ,2 ]
Yi, Lun-Zhao [3 ]
Fu, Guang-Hui [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Sch Sci, 727 Jingming South Rd, Kunming 650500, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Ctr Appl Stat, Kunming 650500, Yunnan, Peoples R China
[3] Kunming Univ Sci & Technol, Fac Food Sci & Engn, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
affine combination; imbalanced data; imbalanced regression; over-sampling; CLASSIFICATION; CLASSIFIERS; CHALLENGES; MODELS; SMOTE; READY;
D O I
10.1002/cem.3537
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced domain prediction analysis is currently one of the hot research topics. Many real-world data mining analyses involve using imbalanced data to obtain predictive models. In the context of imbalance, research on classification problems has been extensive, but research on regression problems is negligible. Rare values rarely occur in imbalanced regression problems, but the focus is on accurately predicting the continuous target variables of rare instances. One of the challenges in imbalanced regression is finding a suitable strategy to rebalance the original dataset in order to improve the predictive performance of the model in rare instances. In this study, two algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS). ACOS rebalances the original dataset by generating new instances through the affine combinations of the original examples. The region where the new instances are generated can be adjusted based on the distribution of the data, ensuring that the generated cases better mimic the distribution of the original examples. The comparison among ACOS, SNOCCR, and other preprocessing methods was conducted on 15 datasets to validate the predictive performance of models trained on rebalanced datasets for rare instances. The experimental results indicate that ACOS outperforms other existing methods. In this study, two over-sampling algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS) in order to improve the predictive performance of the model in imbalanced regression. The experimental results indicate that ACOS outperforms other existing methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Over-Sampling Algorithm Based on VAE in Imbalanced Classification
    Zhang, Chunkai
    Zhou, Ying
    Chen, Yingyang
    Deng, Yepeng
    Wang, Xuan
    Dong, Lifeng
    Wei, Haoyu
    CLOUD COMPUTING - CLOUD 2018, 2018, 10967 : 334 - 344
  • [2] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
  • [3] Denoise-Based Over-Sampling for Imbalanced Data Classification
    Dan, Wang
    Yian, Liu
    2020 19TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES 2020), 2020, : 275 - 278
  • [4] Cluster-Based Minority Over-Sampling for Imbalanced Datasets
    Puntumapon, Kamthorn
    Rakthamamon, Thanawin
    Waiyamai, Kitsana
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (12): : 3101 - 3109
  • [5] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
    Lv, Zhenzhe
    Liu, Qicheng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
  • [6] BCGAN-based Over-sampling Scheme for Imbalanced Data
    Son, Minjae
    Jung, Seungwon
    Moon, Jihoon
    Hwang, Eenjun
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 155 - 160
  • [7] Over-sampling algorithm for imbalanced data classification
    Xu Xiaolong
    Chen Wen
    Sun Yanfei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2019, 30 (06) : 1182 - 1191
  • [8] Imbalanced Node Classification With Synthetic Over-Sampling
    Zhao, Tianxiang
    Zhang, Xiang
    Wang, Suhang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (12) : 8515 - 8528
  • [9] Over-sampling algorithm for imbalanced data classification
    XU Xiaolong
    CHEN Wen
    SUN Yanfei
    JournalofSystemsEngineeringandElectronics, 2019, 30 (06) : 1182 - 1191
  • [10] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610