Variable selection in Logistic regression model with genetic algorithm

被引:23
|
作者
Zhang, Zhongheng [1 ]
Trevino, Victor [2 ]
Hoseini, Sayed Shahabuddin [3 ]
Belciug, Smaranda [4 ]
Boopathi, Arumugam Manivanna [5 ]
Zhang, Ping [6 ]
Gorunescu, Florin [7 ,8 ]
Subha, Velappan [9 ]
Dai, Songshi [10 ,11 ]
机构
[1] Zhejiang Univ, Sir Run Run Shaw Hosp, Sch Med, Dept Emergency Med, Hangzhou 310016, Zhejiang, Peoples R China
[2] Escuela Med Tecnol Monterrey, Catedra Bioinformat, Monterrey, Nuevo Leon, Mexico
[3] Mem Sloan Kettering Canc Ctr, Dept Pediat, 1275 York Ave, New York, NY 10021 USA
[4] Univ Craiova, Dept Comp Sci, Fac Sci, Craiova, Romania
[5] Ariyalur Engn Coll, Dept Elect & Elect Engn, Ariyalur, Tamil Nadu, India
[6] Griffith Univ, Menzies Hlth Inst Queensland, Brisbane, Qld, Australia
[7] Univ Pitesti, Dept Math & Comp Sci, Pitesti, Romania
[8] Univ Med & Pharm Craiova, Dept Math Biostat & Informat, Craiova, Romania
[9] Manonmaniam Sundaranar Univ, Dept Comp Sci & Engn, Tirunelveli, Tamil Nadu, India
[10] Zhejiang Univ, Coll Elect Engn, Hangzhou 310027, Zhejiang, Peoples R China
[11] Hangzhou mAIcim Co Ltd, Hangzhou 310058, Zhejiang, Peoples R China
关键词
Logistic regression; genetic algorithm (GA); variable selection; galgo; PACKAGE;
D O I
10.21037/atm.2018.01.15
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-bystep approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Variable Selection in Logistic Regression Model
    Zhang Shangli
    Zhang Lili
    Qiu Kuanmin
    Lu Ying
    Cai Baigen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2015, 24 (04) : 813 - 817
  • [2] Variable Selection in Logistic Regression Model
    ZHANG Shangli
    ZHANG Lili
    QIU Kuanmin
    LU Ying
    CAI Baigen
    [J]. Chinese Journal of Electronics, 2015, 24 (04) : 813 - 817
  • [3] Variable Selection by Using a Genetic Algorithm for Regression Model
    Yigiter, A.
    Cetin, M.
    [J]. INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS & STATISTICS, 2018, 57 (04): : 1 - 9
  • [4] A genetic algorithm for variable selection in logistic regression analysis of radiotherapy treatment outcomes
    Gayou, Olivier
    Das, Shiva K.
    Zhou, Su-Min
    Marks, Lawrence B.
    Parda, David S.
    Miften, Moyed
    [J]. MEDICAL PHYSICS, 2008, 35 (12) : 5426 - 5433
  • [5] Robust variable selection in the logistic regression model
    Jiang, Yunlu
    Zhang, Jiantao
    Huang, Yingqiang
    Zou, Hang
    Huang, Meilan
    Chen, Fanhong
    [J]. HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2021, 50 (05): : 1572 - 1582
  • [6] Application of Bayesian variable selection in logistic regression model
    Bangchang, Kannat Na
    [J]. AIMS MATHEMATICS, 2024, 9 (05): : 13336 - 13345
  • [7] Variable selection in logistic regression models
    Zellner, D
    Keller, F
    Zellner, GE
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2004, 33 (03) : 787 - 805
  • [8] Variable selection for sparse logistic regression
    Zanhua Yin
    [J]. Metrika, 2020, 83 : 821 - 836
  • [9] Bayesian variable selection for logistic regression
    Tian, Yiqing
    Bondell, Howard D.
    Wilson, Alyson
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2019, 12 (05) : 378 - 393
  • [10] Variable selection for sparse logistic regression
    Yin, Zanhua
    [J]. METRIKA, 2020, 83 (07) : 821 - 836