BOSO: A novel feature selection algorithm for linear regression with high-dimensional data

被引:3
|
作者
Valcarcel, Luis J. [1 ,2 ]
San Jose-Eneriz, Edurne L. [2 ,3 ]
Cendoya, Xabier [1 ]
Rubio, Angel L. [1 ,4 ,5 ]
Agirre, Xabier [2 ,3 ]
Prosper, Felipe L. [2 ,3 ,6 ,7 ]
Planes, Francisco [1 ,4 ,5 ]
机构
[1] Univ Navarra, Tecnun Escuela Ingn, San Sebastian, Spain
[2] Univ Navarra, CIMA Ctr Invest Med Aplicada, Pamplona, Spain
[3] CIBERONC Ctr Invest Biomed Red Canc, Pamplona, Spain
[4] Univ Navarra, Ctr Ingn Biomed, Pamplona, Spain
[5] Univ Navarra, DATAI Inst Ciencia Datos Inteligencia Artificial, Pamplona, Spain
[6] IdiSNA Inst Invest Sanitaria Navarra, Pamplona, Spain
[7] Clin Univ Navarra, Pamplona, Spain
关键词
CANCER; DISCOVERY; GENE;
D O I
10.1371/journal.pcbi.1010180
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Author summaryWe present BOSO (Bilevel Optimization Selector Operator), a novel method to conduct feature selection in linear regression models. In machine learning, feature selection consists of identifying the subset of input variables (features) that are correctly associated with the response variable that is aimed to be predicted. An adequate feature selection is particularly relevant for high-dimensional datasets, commonly encountered in biomedical research questions that rely on -omics data, e.g. predictive models of drug sensitivity, resistance or toxicity, construction of gene regulatory networks, biomarker selection or association studies. The need of feature selection is emphasized in many of these complex problems, since the number of features is greater than the number of samples, which makes it harder to obtain accurate and general predictive models. In this context, we show that the models derived by BOSO make a better combination of accuracy and simplicity than competing approaches in the literature. The relevance of BOSO is illustrated in the prediction of drug sensitivity of cancer cell lines, using RNA-seq data and drug screenings from GDSC (Genomics of Drug Sensitivity in Cancer) database. BOSO obtains linear regression models with a similar level of accuracy but involving a substantially lower number of features, which simplifies the interpretation and validation of predictive models. With the frenetic growth of high-dimensional datasets in different biomedical domains, there is an urgent need to develop predictive methods able to deal with this complexity. Feature selection is a relevant strategy in machine learning to address this challenge. We introduce a novel feature selection algorithm for linear regression called BOSO (Bilevel Optimization Selector Operator). We conducted a benchmark of BOSO with key algorithms in the literature, finding a superior accuracy for feature selection in high-dimensional datasets. Proof-of-concept of BOSO for predicting drug sensitivity in cancer is presented. A detailed analysis is carried out for methotrexate, a well-studied drug targeting cancer metabolism.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] FACO: A Novel Hybrid Feature Selection Algorithm for High-Dimensional Data Classification
    Popoola, Gideon
    Oyeniran, Kayode
    [J]. SOUTHEASTCON 2024, 2024, : 61 - 68
  • [2] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [3] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [4] An Improved Forward Regression Variable Selection Algorithm for High-Dimensional Linear Regression Models
    Xie, Yanxi
    Li, Yuewen
    Xia, Zhijie
    Yan, Ruixia
    [J]. IEEE ACCESS, 2020, 8 (08): : 129032 - 129042
  • [5] Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data
    Cilia, N.
    De Stefano, C.
    Fontanella, F.
    di Freca, A. Scotto
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2018, 2018, 10784 : 439 - 454
  • [6] Efficient Learning and Feature Selection in High-Dimensional Regression
    Ting, Jo-Anne
    D'Souza, Aaron
    Vijayakumar, Sethu
    Schaal, Stefan
    [J]. NEURAL COMPUTATION, 2010, 22 (04) : 831 - 886
  • [7] Preconditioning for feature selection and regression in high-dimensional problems'
    Paul, Debashis
    Bair, Eric
    Hastie, Trevor
    Tibshirani, Robert
    [J]. ANNALS OF STATISTICS, 2008, 36 (04): : 1595 - 1618
  • [8] Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data
    Huang, Shuo-Chieh
    Tsay, Ruey S.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [9] Feature selection for high-dimensional data in astronomy
    Zheng, Hongwen
    Zhang, Yanxia
    [J]. ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
  • [10] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11