Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery

被引:9
|
作者
Lee, Hae Woo [1 ]
Lawton, Carl [1 ]
Na, Young Jeong [2 ]
Yoon, Seongkyu [1 ]
机构
[1] Univ Massachusetts, Dept Chem Engn, Lowell, MA 01854 USA
[2] Harvard Univ, Massachusetts Gen Hosp, Sch Med, Boston, MA USA
关键词
biomarker discovery; chemometrics; early detection; feature selection; omics; ovarian cancer; reproducibility; stability; CARLO CROSS-VALIDATION; SELDI-TOF MS; VARIABLE SELECTION; OVARIAN-CANCER; BREAST-CANCER; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; MASS-SPECTROMETRY; SERUM BIOMARKERS; STABILITY;
D O I
10.1515/sagmb-2012-0067
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In omics studies aimed at the early detection and diagnosis of cancer, bioinformatics tools play a significant role when analyzing high dimensional, complex datasets, as well as when identifying a small set of biomarkers. However, in many cases, there are ambiguities in the robustness and the consistency of the discovered biomarker sets, since the feature selection methods often lead to irreproducible results. To address this, both the stability and the classification power of several chemometrics-based feature selection algorithms were evaluated using the Monte Carlo sampling technique, aiming at finding the most suitable feature selection methods for early cancer detection and biomarker discovery. To this end, two data sets were analyzed, which comprised of MALDI-TOF-MS and LC/TOF-MS spectra measured on serum samples in order to diagnose ovarian cancer. Using these datasets, the stability and the classification power of multiple feature subsets found by different feature selection methods were quantified by varying either the number of selected features, or the number of samples in the training set, with special emphasis placed on the property of stability. The results show that high consistency does not necessarily guarantee high predictive power. In addition, differences in the stability, as well as agreement in feature lists between several feature selection methods, depend on several factors, such as the number of available samples, feature sizes, quality of the information in the dataset, etc. Among the tested methods, only the variable importance in projection (VIP)-based method shows complementary properties, providing both highly consistent and accurate subsets of features. In addition, successive projection analysis (SPA) was excellent with regards to maintaining high stability over a wide range of experimental conditions. The stability of several feature selection methods is highly variable, stressing the importance of making the proper choice among feature selection methods. Therefore, rather than evaluating the selected features using only classification accuracy, stability measurements should be examined as well to improve the reliability of biomarker discovery.
引用
收藏
页码:207 / 223
页数:17
相关论文
共 50 条
  • [21] Bayesian Error Analysis for Feature Selection in Biomarker Discovery
    Pour, Ali Foroughi
    Dalton, Lori A.
    IEEE ACCESS, 2019, 7 : 127544 - 127563
  • [22] Ensemble Feature Selection for Biomarker Discovery in Mass Spectrometry-based Metabolomics
    ShahrjooiHaghighi, AliAsghar
    Frigui, Hichem
    Zhang, Xiang
    Wei, Xiaoli
    Shi, Biyun
    McClain, Craig J.
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 19 - 24
  • [23] A Comparative Study of Redundant Feature Detection based Feature Selection Methods
    Zeng, Xue-Qiang
    Chen, Qian-Sheng
    2014 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (CITS), 2014,
  • [24] Early lung cancer diagnostic biomarker discovery by machine learning methods
    Xie, Ying
    Meng, Wei-Yu
    Li, Run-Ze
    Wang, Yu-Wei
    Qian, Xin
    Chan, Chang
    Yu, Zhi-Fang
    Fan, Xing-Xing
    Pan, Hu-Dan
    Xie, Chun
    Wu, Qi-Biao
    Yan, Pei-Yu
    Liu, Liang
    Tang, Yi-Jun
    Yao, Xiao-Jun
    Wang, Mei-Fang
    Leung, Elaine Lai-Han
    TRANSLATIONAL ONCOLOGY, 2021, 14 (01):
  • [25] Biomarker discovery by proteomics-based approaches for early detection and personalized medicine in colorectal cancer
    Corbo, Claudia
    Cevenini, Armando
    Salvatore, Francesco
    PROTEOMICS CLINICAL APPLICATIONS, 2017, 11 (5-6)
  • [26] Targeted proteomics: Biomarker discovery and validation for the early detection of breast cancer
    Li, J.
    Zhou, J.
    Chan, D. W.
    CLINICAL CHEMISTRY, 2008, 54 (06) : A126 - A126
  • [27] A functional genomic approach to biomarker discovery for the early detection of pancreatic cancer
    Killary, Ann
    Frazier, Marsha
    Sen, Subrata
    CANCER BIOMARKERS, 2008, 4 (03) : 147 - 148
  • [28] SVD based Monte Carlo approach to feature selection for early ovarian cancer detection
    Chen, Shufei
    Han, Bin
    Li, Lihua
    Zhu, Lei
    Lai, Haifeng
    Dai, Qi
    2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING (ICBBE 2010), 2010,
  • [29] Biomarker discovery in inflammatory bowel diseases using network-based feature selection
    Abbas, Mostafa
    Matta, John
    Thanh Le
    Bensmail, Halima
    Obafemi-Ajayi, Tayo
    Honavar, Vasant
    EL-Manzalawy, Yasser
    PLOS ONE, 2019, 14 (11):
  • [30] Research Techniques Made Simple: Feature Selection for Biomarker Discovery
    Torres, Rodrigo
    Judson-Torres, Robert L.
    JOURNAL OF INVESTIGATIVE DERMATOLOGY, 2019, 139 (10) : 2068 - +