Affine combination-based over-sampling for imbalanced regression

被引:0
|
作者
Li, Zhen-Zhen [1 ,2 ]
Huang, Niu [1 ,2 ]
Yi, Lun-Zhao [3 ]
Fu, Guang-Hui [1 ,2 ]
机构
[1] Kunming Univ Sci & Technol, Sch Sci, 727 Jingming South Rd, Kunming 650500, Yunnan, Peoples R China
[2] Kunming Univ Sci & Technol, Ctr Appl Stat, Kunming 650500, Yunnan, Peoples R China
[3] Kunming Univ Sci & Technol, Fac Food Sci & Engn, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
affine combination; imbalanced data; imbalanced regression; over-sampling; CLASSIFICATION; CLASSIFIERS; CHALLENGES; MODELS; SMOTE; READY;
D O I
10.1002/cem.3537
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced domain prediction analysis is currently one of the hot research topics. Many real-world data mining analyses involve using imbalanced data to obtain predictive models. In the context of imbalance, research on classification problems has been extensive, but research on regression problems is negligible. Rare values rarely occur in imbalanced regression problems, but the focus is on accurately predicting the continuous target variables of rare instances. One of the challenges in imbalanced regression is finding a suitable strategy to rebalance the original dataset in order to improve the predictive performance of the model in rare instances. In this study, two algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS). ACOS rebalances the original dataset by generating new instances through the affine combinations of the original examples. The region where the new instances are generated can be adjusted based on the distribution of the data, ensuring that the generated cases better mimic the distribution of the original examples. The comparison among ACOS, SNOCCR, and other preprocessing methods was conducted on 15 datasets to validate the predictive performance of models trained on rebalanced datasets for rare instances. The experimental results indicate that ACOS outperforms other existing methods. In this study, two over-sampling algorithms are proposed: sigma nearest over-sampling based on convex combination for regression (SNOCCR) and affine combination-based over-sampling (ACOS) in order to improve the predictive performance of the model in imbalanced regression. The experimental results indicate that ACOS outperforms other existing methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Abstention-SMOTE: An over-sampling approach for imbalanced data classification
    Zhang, Cheng
    Chen, Yufei
    Liu, Xianhui
    Zhao, Xiaodong
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT 2017), 2017, : 17 - 21
  • [42] A Learning Approach with Under-and Over-sampling for Imbalanced Data Sets
    Yeh, Chun-Wu
    Li, Der-Chiang
    Lin, Liang-Sian
    Tsai, Tung-I
    PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 725 - 729
  • [43] AN IMBALANCED SIGNAL MODULATION CLASSIFICATION AND EVALUATION METHOD BASED ON SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE
    Liu, Xuebo
    Wang, Yiran
    Bai, Jing
    Li, Haoran
    Wang, Xu
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6224 - 6227
  • [44] Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering
    Tao, Xinmin
    Li, Qing
    Guo, Wenjie
    Ren, Chao
    He, Qing
    Liu, Rui
    Zou, JunRong
    INFORMATION SCIENCES, 2020, 519 : 43 - 73
  • [45] Probability Density Function Estimation Based Over-Sampling for Imbalanced Two-Class Problems
    Gao, Ming
    Hong, Xia
    Chen, Sheng
    Harris, Chris J.
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [46] Analysis of over-sampling based identification
    Fang, Mengyuan
    Zhu, Yucai
    AUTOMATICA, 2017, 79 : 101 - 107
  • [47] Borderline Kernel Based Over-Sampling
    Perez-Ortiz, Maria
    Antonio Gutierrez, Pedro
    Hervas-Martinez, Cesar
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, 2013, 8073 : 472 - 481
  • [48] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [49] Characterizations of tight over-sampled affine frame systems and over-sampling rates
    Chui, Charles K.
    Sun, Qiyu
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 2007, 22 (01) : 1 - 15
  • [50] Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis
    Huang, Zhaoke
    Yang, Chunhua
    Chen, Xiaofang
    Huang, Keke
    Xie, Yongfang
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (11): : 7183 - 7199