Performance of asymmetric links and correction methods for imbalanced data in binary regression

被引:10
|
作者
Huayanay, Alex de la Cruz [1 ]
Bazan, Jorge L. [2 ]
Cancho, Vicente G. [2 ]
Dey, Dipak K. [3 ]
机构
[1] USP UFSCar, Interinst Grad Stat, Sao Carlos, SP, Brazil
[2] Univ Sao Paulo, Dept Appl Math & Stat, Sao Carlos, SP, Brazil
[3] Univ Connecticut, Dept Stat, Mansfield, CT USA
基金
巴西圣保罗研究基金会;
关键词
Asymmetric link; binary regression; imbalanced data; predictive evaluation; quantile residuals; similarity measures; CROSS-VALIDATION; MODEL; PROBIT;
D O I
10.1080/00949655.2019.1593984
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In binary regression, imbalanced data result from the presence of values equal to zero (or one) in a proportion that is significantly greater than the corresponding real values of one (or zero). In this work, we evaluate two methods developed to deal with imbalanced data and compare them to the use of asymmetric links. The results based on simulation study show, that correction methods do not adequately correct bias in the estimation of regression coefficients and that the models with power links and reverse power considered produce better results for certain types of imbalanced data. Additionally, we present an application for imbalanced data, identifying the best model among the various ones proposed. The parameters are estimated using a Bayesian approach, considering the Hamiltonian Monte-Carlo method, utilizing the No-U-Turn Sampler algorithm and the comparisons of models were developed using different criteria for model comparison, predictive evaluation and quantile residuals.
引用
收藏
页码:1694 / 1714
页数:21
相关论文
共 50 条
  • [31] Performance of the student binary regression model according to the data separation setting
    Leon, Lorena
    Peyhardi, Jean
    Trottier, Catherine
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2025,
  • [32] IRDA: Implicit data augmentation for deep imbalanced regression
    Zhu, Weiyao
    Wu, Ou
    Yang, Nan
    INFORMATION SCIENCES, 2024, 677
  • [33] A Study on the Impact of Data Characteristics in Imbalanced Regression Tasks
    Branco, Paula
    Torgo, Luis
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 193 - 202
  • [34] Multi-output regression for imbalanced data stream
    Peng, Tao
    Sellami, Sana
    Boucelma, Omar
    Chbeir, Richard
    EXPERT SYSTEMS, 2023, 40 (10)
  • [35] Chebyshev approaches for imbalanced data streams regression models
    Ehsan Aminian
    Rita P. Ribeiro
    João Gama
    Data Mining and Knowledge Discovery, 2021, 35 : 2389 - 2466
  • [36] Minimax Optimal Rates With Heavily Imbalanced Binary Data
    Song, Yang
    Zou, Hui
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (12) : 9001 - 9011
  • [37] Chebyshev approaches for imbalanced data streams regression models
    Aminian, Ehsan
    Ribeiro, Rita P.
    Gama, Joao
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (06) : 2389 - 2466
  • [38] Asymmetric classifier based on kernel PLS for imbalanced data
    Ma, Ying
    Su, Bing-Huang
    Zhu, Shunzhi
    Weng, Wei
    Huang, Liang
    Hu, Jianqiang
    10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015), 2015, : 482 - 485
  • [39] Performance of likelihood-based estimation methods for multilevel binary regression models
    Callens, M
    Croux, C
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2005, 75 (12) : 1003 - 1017
  • [40] ROBUST BINARY LOGISTIC REGRESSION METHODS
    Li, Hong
    ADVANCES AND APPLICATIONS IN STATISTICS, 2022, 77 : 93 - 108