An automated exact solution framework towards solving the logistic regression best subset selection problem

被引:0
|
作者
van Niekerk, Thomas K. [1 ]
Venter, Jacques V. [2 ]
Terblanche, Stephanus E. [1 ]
机构
[1] North West Univ, Sch Ind Engn, Potchefstroom, South Africa
[2] North West Univ, Ctr Business Math & Informat, Potchefstroom, South Africa
关键词
Best subset selection; Independent variable selection; Logistic regression; Mixed integer programming; VARIABLE SELECTION;
D O I
10.37920/sasj.2023.57.2.2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An automated logistic regression solution framework (ALRSF) is proposed to solve a mixed integer programming (MIP) formulation of the well known logistic regression best subset selection problem. The solution framework firstly determines the optimal number of independent variables that should be included in the model using an automated cardinality parameter selection procedure. The cardinality pa-rameter dictates the size of the subset of variables and can be problem-specific. A novel regression parameter fixing heuristic that utilises a Benders decomposition algorithm is applied to prune the solution search space such that the optimal regres-sion parameter values are found faster. An optimality gap is subsequently calculated to quantify the quality of the final regression model by considering the distance between the best possible log-likelihood value and a log-likelihood value that is calculated using the current parameter values. Attempts are then made to reduce the optimality gap by adjusting regression parameter values. The ALRSF serves as a holistic variable selection framework that enables the user to consider larger datasets when solving the best subset selection logistic regression problem by significantly reducing the memory requirements associated with its mixed integer programming formulation. Furthermore, the automated framework requires minimal user inter-vention during model training and hyperparameter tuning. Improvements in quality of the final model (when considering both the optimality gap and computing re-sources required to achieve a result) are observed when the ALRSF is applied to well-known real-world UCI machine learning datasets.
引用
收藏
页码:89 / 129
页数:41
相关论文
共 50 条
  • [21] Exact and heuristic procedures for solving the fuzzy portfolio selection problem
    Cadenas, J. M.
    Carrillo, J. V.
    Garrido, M. C.
    Ivorra, C.
    Liern, V.
    FUZZY OPTIMIZATION AND DECISION MAKING, 2012, 11 (01) : 29 - 46
  • [22] Exact and heuristic procedures for solving the fuzzy portfolio selection problem
    J. M. Cadenas
    J. V. Carrillo
    M. C. Garrido
    C. Ivorra
    V. Liern
    Fuzzy Optimization and Decision Making, 2012, 11 : 29 - 46
  • [23] Towards parallel migrating Birdrs framework for Feature subset problem
    El Aboudi, Naoual
    Benhlima, Laila
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE, E-LEARNING AND INFORMATION SYSTEMS 2018 (DATA'18), 2018,
  • [24] Best Wing System: An Exact Solution of the Prandtl's Problem
    Frediani, Aldo
    Montanari, Guido
    VARIATIONAL ANALYSIS AND AEROSPACE ENGINEERING, 2009, 33 : 183 - +
  • [25] Determination of the selection statistics and best significance level in backward stepwise logistic regression
    Wang, Qinggang
    Koval, John J.
    Mills, Catherine A.
    Lee, Kang-In David
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2008, 37 (01) : 62 - 72
  • [26] Towards a confluence framework of problem solving in educational contexts
    Koichu, Boris
    PROCEEDINGS OF THE NINTH CONFERENCE OF THE EUROPEAN SOCIETY FOR RESEARCH IN MATHEMATICS EDUCATION (CERME9), 2015, : 2668 - 2674
  • [27] BeSS: An R Package for Best Subset Selection in Linear, Logistic and Cox Proportional Hazards Models
    Wen, Canhong
    Zhang, Aijun
    Quan, Shijie
    Wang, Xueqin
    JOURNAL OF STATISTICAL SOFTWARE, 2020, 94 (04): : 1 - 24
  • [28] A mixed-integer exponential cone programming formulation for feature subset selection in logistic regression
    Ahari, Sahand Asgharieh
    Kocuk, Burak
    EURO JOURNAL ON COMPUTATIONAL OPTIMIZATION, 2023, 11
  • [29] Massively-Parallel Best Subset Selection for Ordinary Least-Squares Regression
    Gieseke, Fabian
    Polsterer, Kai Lars
    Mahabal, Ashish
    Igel, Christian
    Heskes, Tom
    2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017,
  • [30] Towards optimal descriptor subset selection with support vector machines in classification and regression
    Fröhlich, H
    Wegner, JK
    Zell, A
    QSAR & COMBINATORIAL SCIENCE, 2004, 23 (05): : 311 - 318