Subset selection for multiple linear regression via optimization

被引:11
|
作者
Park, Young Woong [1 ]
Klabjan, Diego [2 ]
机构
[1] Iowa State Univ, Ivy Coll Business, Ames, IA 50011 USA
[2] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA
关键词
Multiple linear regression; Subset selection; High dimensional data; Mathematical programming; Linearization; ABSOLUTE ERROR MAE; PROGRAMMING FORMULATIONS; RMSE;
D O I
10.1007/s10898-020-00876-1
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy-maximal-relevance criteria. The proposed models are tested using a linear-program-based branch-and-bound algorithm with tailored valid inequalities and big M values and are compared against the algorithms in the literature. For high dimensional cases, an iterative heuristic algorithm is proposed based on the mathematical programming models and a core set concept, and a randomized version of the algorithm is derived to guarantee convergence to the global optimum. From the computational experiments, we find that our models quickly find a quality solution while the rest of the time is spent to prove optimality; the iterative algorithms find solutions in a relatively short time and are competitive compared to state-of-the-art algorithms; using ad-hoc big M values is not recommended.
引用
收藏
页码:543 / 574
页数:32
相关论文
共 50 条
  • [1] Subset selection for multiple linear regression via optimization
    Young Woong Park
    Diego Klabjan
    [J]. Journal of Global Optimization, 2020, 77 : 543 - 574
  • [2] Subset selection in multiple linear regression in the presence of outlier and multicollinearity
    Jadhav, Nileshkumar H.
    Kashid, Dattatraya N.
    Kulkarni, Subhash R.
    [J]. STATISTICAL METHODOLOGY, 2014, 19 : 44 - 59
  • [3] A more general criterion for subset selection in multiple linear regression
    Kashid, DN
    Kulkarni, SR
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2002, 31 (05) : 795 - 811
  • [4] Algorithms for Subset Selection in Linear Regression
    Das, Abhimanyu
    Kempe, David
    [J]. STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING, 2008, : 45 - 54
  • [5] Subset selection in multiple linear regression: a new mathematical programming approach
    Eksioglu, B
    Demirer, R
    Capar, I
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2005, 49 (01) : 155 - 167
  • [6] Subset selection in multiple linear regression with heavy tailed error distribution
    Kashid, DN
    Kulkarni, SR
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2003, 73 (11) : 791 - 805
  • [7] Group subset selection for linear regression
    Guo, Yi
    Berman, Mark
    Gao, Junbin
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 75 : 39 - 52
  • [8] Feature subset selection for logistic regression via mixed integer optimization
    Sato, Toshiki
    Takano, Yuichi
    Miyashiro, Ryuhei
    Yoshise, Akiko
    [J]. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2016, 64 (03) : 865 - 880
  • [9] Feature subset selection for logistic regression via mixed integer optimization
    Toshiki Sato
    Yuichi Takano
    Ryuhei Miyashiro
    Akiko Yoshise
    [J]. Computational Optimization and Applications, 2016, 64 : 865 - 880
  • [10] A mathematical programming approach for integrated multiple linear regression subset selection and validation
    Chung, Seokhyun
    Park, Young Woong
    Cheong, Taesu
    [J]. PATTERN RECOGNITION, 2020, 108 (108)