An automated exact solution framework towards solving the logistic regression best subset selection problem

被引:0
|
作者
van Niekerk, Thomas K. [1 ]
Venter, Jacques V. [2 ]
Terblanche, Stephanus E. [1 ]
机构
[1] North West Univ, Sch Ind Engn, Potchefstroom, South Africa
[2] North West Univ, Ctr Business Math & Informat, Potchefstroom, South Africa
关键词
Best subset selection; Independent variable selection; Logistic regression; Mixed integer programming; VARIABLE SELECTION;
D O I
10.37920/sasj.2023.57.2.2
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
An automated logistic regression solution framework (ALRSF) is proposed to solve a mixed integer programming (MIP) formulation of the well known logistic regression best subset selection problem. The solution framework firstly determines the optimal number of independent variables that should be included in the model using an automated cardinality parameter selection procedure. The cardinality pa-rameter dictates the size of the subset of variables and can be problem-specific. A novel regression parameter fixing heuristic that utilises a Benders decomposition algorithm is applied to prune the solution search space such that the optimal regres-sion parameter values are found faster. An optimality gap is subsequently calculated to quantify the quality of the final regression model by considering the distance between the best possible log-likelihood value and a log-likelihood value that is calculated using the current parameter values. Attempts are then made to reduce the optimality gap by adjusting regression parameter values. The ALRSF serves as a holistic variable selection framework that enables the user to consider larger datasets when solving the best subset selection logistic regression problem by significantly reducing the memory requirements associated with its mixed integer programming formulation. Furthermore, the automated framework requires minimal user inter-vention during model training and hyperparameter tuning. Improvements in quality of the final model (when considering both the optimality gap and computing re-sources required to achieve a result) are observed when the ALRSF is applied to well-known real-world UCI machine learning datasets.
引用
收藏
页码:89 / 129
页数:41
相关论文
共 50 条
  • [1] SELECTION OF BEST SUBSET IN REGRESSION ANALYSIS
    HOCKING, RR
    LESLIE, RN
    TECHNOMETRICS, 1967, 9 (04) : 531 - &
  • [2] SELECTION OF BEST SUBSET IN REGRESSION ANALYSIS
    HOCKING, RR
    TECHNOMETRICS, 1967, 9 (01) : 188 - &
  • [3] A Local-branching Heuristic for the Best Subset Selection Problem in Linear Regression
    Bigler, Tamara
    Strub, Oliver
    2018 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEE IEEM), 2018, : 511 - 515
  • [4] A solution to the problem of separation in logistic regression
    Heinze, G
    Schemper, M
    STATISTICS IN MEDICINE, 2002, 21 (16) : 2409 - 2419
  • [5] Selection of the best calibration sample subset for multivariate regression
    Ferre, J
    Rius, FX
    ANALYTICAL CHEMISTRY, 1996, 68 (09) : 1565 - 1571
  • [6] Cost-Sensitive Best Subset Selection for Logistic Regression: A Mixed-Integer Conic Optimization Perspective
    Knauer, Ricardo
    Rodner, Erik
    ADVANCES IN ARTIFICIAL INTELLIGENCE, KI 2023, 2023, 14236 : 114 - 129
  • [7] A polynomial algorithm for best-subset selection problem
    Zhu, Junxian
    Wen, Canhong
    Zhu, Jin
    Zhang, Heping
    Wang, Xueqin
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2020, 117 (52) : 33117 - 33123
  • [8] Separable Approximation for Solving the Sensor Subset Selection Problem
    Ghassemi, Farhad
    Krishnamurthy, Vikram
    IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2011, 47 (01) : 557 - 568
  • [9] Feature subset selection for logistic regression via mixed integer optimization
    Sato, Toshiki
    Takano, Yuichi
    Miyashiro, Ryuhei
    Yoshise, Akiko
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2016, 64 (03) : 865 - 880
  • [10] Feature subset selection for logistic regression via mixed integer optimization
    Toshiki Sato
    Yuichi Takano
    Ryuhei Miyashiro
    Akiko Yoshise
    Computational Optimization and Applications, 2016, 64 : 865 - 880