Performance of asymmetric links and correction methods for imbalanced data in binary regression

被引:10
|
作者
Huayanay, Alex de la Cruz [1 ]
Bazan, Jorge L. [2 ]
Cancho, Vicente G. [2 ]
Dey, Dipak K. [3 ]
机构
[1] USP UFSCar, Interinst Grad Stat, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Dept Appl Math & Stat, Sao Carlos, SP, Brazil
[3] Univ Connecticut, Dept Stat, Mansfield, CT USA
基金
巴西圣保罗研究基金会;
关键词
Asymmetric link; binary regression; imbalanced data; predictive evaluation; quantile residuals; similarity measures; CROSS-VALIDATION; MODEL; PROBIT;
D O I
10.1080/00949655.2019.1593984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.
引用
收藏
页码:1694 / 1714
页数:21
相关论文
共 50 条
  • [41] A Performance Analysis of Classifiers on Imbalanced Data
    Garcia, Nathan F.
    Strzoda, Romulo A.
    Lucca, Giancarlo
    Borges, Eduardo N.
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 602 - 609
  • [42] The Performance of Allocation Method on Imbalanced Data
    Karakatic, Saso
    Hericko, Marjan
    Podgorelec, Vili
    INFORMATION MODELLING AND KNOWLEDGE BASES XXVIII, 2017, 292 : 382 - 395
  • [43] New links for binary regression: an application to coca cultivation in Peru
    Lemonte, Artur J.
    Bazan, Jorge L.
    TEST, 2018, 27 (03) : 597 - 617
  • [44] New links for binary regression: an application to coca cultivation in Peru
    Artur J. Lemonte
    Jorge L. Bazán
    TEST, 2018, 27 : 597 - 617
  • [45] Imbalanced Data Classification Based on Hybrid Methods
    Zhang, Nai-Nan
    Ye, Shao-Zhen
    Chien, Ting-Ying
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 16 - 20
  • [46] A review of boosting methods for imbalanced data classification
    Li, Qiujie
    Mao, Yaobin
    PATTERN ANALYSIS AND APPLICATIONS, 2014, 17 (04) : 679 - 693
  • [47] A review of boosting methods for imbalanced data classification
    Qiujie Li
    Yaobin Mao
    Pattern Analysis and Applications, 2014, 17 : 679 - 693
  • [48] Error Backpropagation with Attention Control to Learn Imbalanced Data for Regression
    Lee, Chang Hwa
    Lee, Sang Wan
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 2820 - 2824
  • [49] Methods for Data Selection in Medical Databases: The Binary Logistic Regression - Relations with the Calculated Risks
    Dascalu, Cristina G.
    Carausu, Elena Mihaela
    Manuc, Daniela
    PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 28, 2008, 28 : 278 - +
  • [50] Automated Bayesian variable selection methods for binary regression models with missing covariate data
    Michael Bergrab
    Christian Aßmann
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2024, 18 (2) : 203 - 244