Performance of asymmetric links and correction methods for imbalanced data in binary regression

被引:10
|
作者
Huayanay, Alex de la Cruz [1 ]
Bazan, Jorge L. [2 ]
Cancho, Vicente G. [2 ]
Dey, Dipak K. [3 ]
机构
[1] USP UFSCar, Interinst Grad Stat, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Dept Appl Math & Stat, Sao Carlos, SP, Brazil
[3] Univ Connecticut, Dept Stat, Mansfield, CT USA
基金
巴西圣保罗研究基金会;
关键词
Asymmetric link; binary regression; imbalanced data; predictive evaluation; quantile residuals; similarity measures; CROSS-VALIDATION; MODEL; PROBIT;
D O I
10.1080/00949655.2019.1593984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.
引用
收藏
页码:1694 / 1714
页数:21
相关论文
共 50 条
  • [1] Logistic regression in large rare events and imbalanced data: A performance comparison of prior correction and weighting methods
    Maalouf, Maher
    Homouz, Dirar
    Trafalis, Theodore B.
    COMPUTATIONAL INTELLIGENCE, 2018, 34 (01) : 161 - 174
  • [2] Asymmetric Binary Regression Models for Imbalanced Datasets: An Application to Students' Churn
    La Rocca, Michele
    Niglio, Marcella
    Restaino, Marialuisa
    RECENT TRENDS AND FUTURE CHALLENGES IN LEARNING FROM DATA, ECDA 2022, 2024, : 63 - 74
  • [3] Flexible cloglog links for binomial regression models as an alternative for imbalanced medical data
    Alves, Jessica S. B.
    Bazan, Jorge L.
    Arellano-Valle, Reinaldo B.
    BIOMETRICAL JOURNAL, 2023, 65 (03)
  • [4] Predictive Performance of Logistic Regression for Imbalanced Data with Categorical Covariate
    Abd Rahman, Hezlin Aryani
    Wah, Yap Bee
    Huat, Ong Seng
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2021, 29 (01): : 181 - 197
  • [5] Predictive Performance of Logistic Regression for Imbalanced Data with Categorical Covariate
    Abd Rahman, Hezlin Aryani
    Wah, Yap Bee
    Huat, Ong Seng
    PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2020, 28 (04): : 1141 - 1161
  • [6] Comparison of resampling methods for dealing with imbalanced data in binary classification problem
    Park, Geun U.
    Jun, Inkyun G.
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (03) : 349 - 374
  • [7] Binary Classification with Imbalanced Data
    Chiang, Jyun-You
    Lio, Yuhlong
    Hsu, Chien-Ya
    Ho, Chia-Ling
    Tsai, Tzong-Ru
    ENTROPY, 2024, 26 (01)
  • [8] Calibration methods in imbalanced binary classification
    Guilbert, Theo
    Caelen, Olivier
    Chirita, Andrei
    Saerens, Marco
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2024, 92 (05) : 1319 - 1352
  • [9] Oversampling techniques for imbalanced data in regression
    Belhaouari, Samir Brahim
    Islam, Ashhadul
    Kassoul, Khelil
    Al-Fuqaha, Ala
    Bouzerdoum, Abdesselam
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [10] Stable variable ranking and selection in regularized logistic regression for severely imbalanced big binary data
    Nadeem, Khurram
    Jabri, Mehdi-Abderrahman
    PLOS ONE, 2023, 18 (01):