SMOTEBoost for Regression: Improving the Prediction of Extreme Values

被引:25
|
作者
Moniz, Nuno [1 ]
Ribeiro, Rita P. [1 ]
Cerqueira, Vitor [1 ]
Chawla, Nitesh [2 ]
机构
[1] Univ Porto, INESC TEC, Porto, Portugal
[2] Univ Notre Dame, Indiana, PA USA
关键词
Imbalanced Domain Learning; Ensemble Learning; Boosting; Regression;
D O I
10.1109/DSAA.2018.00025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Supervised learning with imbalanced domains is one of the biggest challenges in machine learning. Such tasks differ from standard learning tasks by assuming a skewed distribution of target variables, and user domain preference towards under-represented cases. Most research has focused on imbalanced classification tasks, where a wide range of solutions has been tested. Still, little work has been done concerning imbalanced regression tasks. In this paper, we propose an adaptation of the SMOTEBoost approach for the problem of imbalanced regression. Originally designed for classification tasks, it combines boosting methods and the SMOTE resampling strategy. We present four variants of SMOTEBoost and provide an experimental evaluation using 30 datasets with an extensive analysis of results in order to assess the ability of SMOTEBoost methods in predicting extreme target values, and their predictive trade-off concerning baseline boosting methods. SMOTEBoost is publicly available in a software package.
引用
收藏
页码:150 / 159
页数:10
相关论文
共 50 条
  • [31] Extreme values of ζ′(ρ)
    Ng, Nathan
    JOURNAL OF THE LONDON MATHEMATICAL SOCIETY-SECOND SERIES, 2008, 78 : 273 - 289
  • [32] Further Improving the Performance of Logistic Regression Analysis Using Double Extreme Ranking
    Samawi, Hani M.
    Zhang, Xinyan
    Rochani, Haresh
    JOURNAL OF STATISTICAL THEORY AND PRACTICE, 2020, 14 (01)
  • [33] Further Improving the Performance of Logistic Regression Analysis Using Double Extreme Ranking
    Hani M. Samawi
    Xinyan Zhang
    Haresh Rochani
    Journal of Statistical Theory and Practice, 2020, 14
  • [34] Improving generalized regression analysis for the spatial prediction of forest communities
    Maggini, Ramona
    Lehmann, Anthony
    Zimmermann, Niklaus E.
    Guisan, Antoine
    JOURNAL OF BIOGEOGRAPHY, 2006, 33 (10) : 1729 - 1749
  • [35] Improving Septic Shock Prediction with AdaBoost and Cox Regression Model
    Darwiche, Aiman
    EL-Geneidy, Ayman
    Mukherjee, Sumitra
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS AND COMPUTER ENGINEERING (ICCECE), 2021, : 522 - 527
  • [36] Improving Three-Dimensional Dose Prediction with Ordinal Regression
    Yuan, Y.
    Tseng, T.
    Lo, Y.
    MEDICAL PHYSICS, 2022, 49 (06) : E668 - E669
  • [37] Prediction of record values by using quantile regression curves and distortion functions
    Jorge Navarro
    Metrika, 2022, 85 : 675 - 706
  • [38] Prediction of response values in linear regression models from replicated experiments
    Toutenburg, H
    Shalabh
    STATISTICAL PAPERS, 2002, 43 (03) : 423 - 433
  • [39] Prediction of record values by using quantile regression curves and distortion functions
    Navarro, Jorge
    METRIKA, 2022, 85 (06) : 675 - 706
  • [40] Prediction of response values in linear regression models from replicated experiments
    H. Toutenburg
    Statistical Papers, 2002, 43 : 423 - 433