Subset selection for multiple linear regression via optimization

被引:11
|
作者
Park, Young Woong [1 ]
Klabjan, Diego [2 ]
机构
[1] Iowa State Univ, Ivy Coll Business, Ames, IA 50011 USA
[2] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA
关键词
Multiple linear regression; Subset selection; High dimensional data; Mathematical programming; Linearization; ABSOLUTE ERROR MAE; PROGRAMMING FORMULATIONS; RMSE;
D O I
10.1007/s10898-020-00876-1
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy-maximal-relevance criteria. The proposed models are tested using a linear-program-based branch-and-bound algorithm with tailored valid inequalities and big M values and are compared against the algorithms in the literature. For high dimensional cases, an iterative heuristic algorithm is proposed based on the mathematical programming models and a core set concept, and a randomized version of the algorithm is derived to guarantee convergence to the global optimum. From the computational experiments, we find that our models quickly find a quality solution while the rest of the time is spent to prove optimality; the iterative algorithms find solutions in a relatively short time and are competitive compared to state-of-the-art algorithms; using ad-hoc big M values is not recommended.
引用
收藏
页码:543 / 574
页数:32
相关论文
共 50 条
  • [31] SELECTION OF BEST SUBSET IN REGRESSION ANALYSIS
    HOCKING, RR
    LESLIE, RN
    [J]. TECHNOMETRICS, 1967, 9 (04) : 531 - &
  • [32] Variable and subset selection in PLS regression
    Höskuldsson, A
    [J]. CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2001, 55 (1-2) : 23 - 38
  • [33] SELECTION OF BEST SUBSET IN REGRESSION ANALYSIS
    HOCKING, RR
    [J]. TECHNOMETRICS, 1967, 9 (01) : 188 - &
  • [34] SOCP relaxation bounds for the optimal subset selection problem applied to robust linear regression
    Flores, Salvador
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 246 (01) : 44 - 50
  • [35] Subset Selection in Linear Regression using Sequentially Normalized Least Squares: Asymptotic Theory
    Maatta, Jussi
    Schmidt, Daniel F.
    Roos, Teemu
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2016, 43 (02) : 382 - 395
  • [36] Variable selection in linear regression models: Choosing the best subset is not always the best choice
    Hanke, Moritz
    Dijkstra, Louis
    Foraita, Ronja
    Didelez, Vanessa
    [J]. BIOMETRICAL JOURNAL, 2023,
  • [37] Variable selection in linear regression models: Choosing the best subset is not always the best choice
    Hanke, Moritz
    Dijkstra, Louis
    Foraita, Ronja
    Didelez, Vanessa
    [J]. BIOMETRICAL JOURNAL, 2024, 66 (01)
  • [38] Subset Selection by Pareto Optimization
    Qian, Chao
    Yu, Yang
    Zhou, Zhi-Hua
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [39] SiPM signal processing via multiple linear regression
    Schmailzl, Wolfgang
    Piemonte, Claudio
    Garutti, Erika
    Hansch, Walter
    [J]. JOURNAL OF INSTRUMENTATION, 2023, 18 (07)
  • [40] FIRE: an SPSS program for variable selection in multiple linear regression analysis via the relative importance of predictors
    Urbano Lorenzo-Seva
    Pere J. Ferrando
    [J]. Behavior Research Methods, 2011, 43 : 1 - 7