A comparison of model choice strategies for logistic regression

被引:0
|
作者
Karhunen, Markku [1 ]
机构
[1] Finnish Environm Inst Syke, Built Environm Solut Unit, Latokartanonkaari 11, Helsinki 00790, Finland
关键词
Model choice; Logistic regression; Logit regression; Monte Carlo simulations; Sensitivity; Specificity; SELECTION; GENE;
D O I
10.2478/jdis-2024-0001
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Purpose: The purpose of this study is to develop and compare model choice strategies in context of logistic regression. Model choice means the choice of the covariates to be included in the model. Design/methodology/approach: The study is based on Monte Carlo simulations. The methods are compared in terms of three measures of accuracy: specificity and two kinds of sensitivity. A loss function combining sensitivity and specificity is introduced and used for a final comparison. Findings: The choice of method depends on how much the users emphasize sensitivity against specificity. It also depends on the sample size. For a typical logistic regression setting with a moderate sample size and a small to moderate effect size, either BIC, BICc or Lasso seems to be optimal. Research limitations: Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data. Thus, more simulations are needed. Practical implications Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper. Alternatively, they could run their own simulations and calculate the loss function. Originality/value: This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression. The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.
引用
收藏
页码:37 / 52
页数:16
相关论文
共 50 条
  • [31] A Comparison of Classification/Regression Trees and Logistic Regression in Failure Models
    Irimia-Dieguez, A. I.
    Blanco-Oliver, A.
    Vazquez-Cueto, M. J.
    [J]. 4TH WORLD CONFERENCE ON BUSINESS, ECONOMICS AND MANAGEMENT (WCBEM-2015), 2015, 26 : 23 - 28
  • [32] Logistic regression models for the nearest train station choice: A comparison of captive and non-captive stations
    Shao, Changying
    Xia, Jianhong Cecilia
    Lin, Ting Grace
    Goulias, Konstadinos G.
    Chen, Chunmei
    [J]. CASE STUDIES ON TRANSPORT POLICY, 2015, 3 (04) : 382 - 391
  • [33] A comparison between the Bayesian network model and the logistic regression model in prevention of the defects on ceramic tiles
    Sevinc, Volkan
    Kirca, Meryem Merve
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (07) : 1385 - 1401
  • [34] A logistic regression model for detecting prominences
    Maghbouleh, A
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2443 - 2445
  • [35] ON ROBUSTNESS IN THE LOGISTIC-REGRESSION MODEL
    CARROLL, RJ
    PEDERSON, S
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1993, 55 (03): : 693 - 706
  • [36] Model Selection for Logistic Regression Models
    Duller, Christine
    [J]. NUMERICAL ANALYSIS AND APPLIED MATHEMATICS (ICNAAM 2012), VOLS A AND B, 2012, 1479 : 414 - 416
  • [37] An Application on Multinomial Logistic Regression Model
    El-Habil, Abdalla M.
    [J]. PAKISTAN JOURNAL OF STATISTICS AND OPERATION RESEARCH, 2012, 8 (02) : 271 - 291
  • [38] Variable Selection in Logistic Regression Model
    Zhang Shangli
    Zhang Lili
    Qiu Kuanmin
    Lu Ying
    Cai Baigen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (04) : 813 - 817
  • [39] Logistic regression model for carotid endarterectomy
    Kuhan, G
    Gardiner, ED
    Abidia, AF
    Chetter, I
    Renwick, P
    Johnson, BF
    Wilkinson, AR
    McCollum, PT
    [J]. BRITISH JOURNAL OF SURGERY, 2001, 88 (05) : 735 - 735
  • [40] Robust testing in the logistic regression model
    Bianco, Ana M.
    Martinez, Elena
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2009, 53 (12) : 4095 - 4105