Affine combination-based over-sampling for imbalanced regression

被引:0
|
作者
Li, Zhen-Zhen [1 ,2 ]
Huang, Niu [1 ,2 ]
Yi, Lun-Zhao [3 ]
Fu, Guang-Hui [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Sch Sci, 727 Jingming South Rd, Kunming 650500, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Ctr Appl Stat, Kunming 650500, Yunnan, Peoples R China
[3] Kunming Univ Sci & Technol, Fac Food Sci & Engn, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
affine combination; imbalanced data; imbalanced regression; over-sampling; CLASSIFICATION; CLASSIFIERS; CHALLENGES; MODELS; SMOTE; READY;
D O I
10.1002/cem.3537
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced domain prediction analysis is currently one of the hot research topics. Many real-world data mining analyses involve using imbalanced data to obtain predictive models. In the context of imbalance, research on classification problems has been extensive, but research on regression problems is negligible. Rare values rarely occur in imbalanced regression problems, but the focus is on accurately predicting the continuous target variables of rare instances. One of the challenges in imbalanced regression is finding a suitable strategy to rebalance the original dataset in order to improve the predictive performance of the model in rare instances. In this study, two algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS). ACOS rebalances the original dataset by generating new instances through the affine combinations of the original examples. The region where the new instances are generated can be adjusted based on the distribution of the data, ensuring that the generated cases better mimic the distribution of the original examples. The comparison among ACOS, SNOCCR, and other preprocessing methods was conducted on 15 datasets to validate the predictive performance of models trained on rebalanced datasets for rare instances. The experimental results indicate that ACOS outperforms other existing methods. In this study, two over-sampling algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS) in order to improve the predictive performance of the model in imbalanced regression. The experimental results indicate that ACOS outperforms other existing methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Margin-Based Over-Sampling Method for Learning from Imbalanced Datasets
    Fan, Xiannian
    Tang, Ke
    Weise, Thomas
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 309 - 320
  • [22] A Novel Evolutionary Preprocessing Method Based on Over-sampling and Under-sampling for Imbalanced Datasets
    Wong, Ginny Y.
    Leung, Frank H. F.
    Ling, Sai-Ho
    39TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY (IECON 2013), 2013, : 2354 - 2359
  • [23] A novel clustering-based over-sampling technique for imbalanced data sets
    Mirzaei, Behzad
    Nezamabadi-pour, Hossein
    Mahmoodi, Javad
    2024 32ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, ICEE 2024, 2024, : 662 - 668
  • [24] Feature selection and its combination with data over-sampling for multi-class imbalanced datasets
    Tsai, Chih-Fong
    Chen, Kuan-Chen
    Lin, Wei -Chao
    APPLIED SOFT COMPUTING, 2024, 153
  • [25] Classifier Learning from Imbalanced Corpus by Autoencoded Over-Sampling
    Park, Eunkyung
    Wong, Raymond K.
    Chu, Victor W.
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2019, 11670 : 16 - 29
  • [26] An Effective Over-sampling Method for Imbalanced Data Sets Classification
    Zhai Yun
    Ma Nan
    Ruan Da
    An Bing
    CHINESE JOURNAL OF ELECTRONICS, 2011, 20 (03): : 489 - 494
  • [27] Multiple adaptive over-sampling for imbalanced data evidential classification
    Zhang, Zhen
    Tian, Hong -peng
    Jin, Jin-shuai
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [28] Graph-Based Approaches for Over-Sampling in the Context of Ordinal Regression
    Perez-Ortiz, Maria
    Antonio Gutierrez, Pedro
    Hervas-Martinez, Cesar
    Yao, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1233 - 1245
  • [29] SROT: Sparse representation-based over-sampling technique for classification of imbalanced dataset
    Zou, Xionggao
    Feng, Yueping
    Li, Huiying
    Jiang, Shuyu
    2ND INTERNATIONAL CONFERENCE ON MATERIALS SCIENCE, ENERGY TECHNOLOGY AND ENVIRONMENTAL ENGINEERING (MSETEE 2017), 2017, 81
  • [30] PDFOS: PDF estimation based over-sampling for imbalanced two-class problems
    Gao, Ming
    Hong, Xia
    Chen, Sheng
    Harris, Chris J.
    Khalaf, Emad
    NEUROCOMPUTING, 2014, 138 : 248 - 259