Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery

被引:9
|
作者
Lee, Hae Woo [1 ]
Lawton, Carl [1 ]
Na, Young Jeong [2 ]
Yoon, Seongkyu [1 ]
机构
[1] Univ Massachusetts, Dept Chem Engn, Lowell, MA 01854 USA
[2] Harvard Univ, Massachusetts Gen Hosp, Sch Med, Boston, MA USA
关键词
biomarker discovery; chemometrics; early detection; feature selection; omics; ovarian cancer; reproducibility; stability; CARLO CROSS-VALIDATION; SELDI-TOF MS; VARIABLE SELECTION; OVARIAN-CANCER; BREAST-CANCER; WAVELENGTH SELECTION; MULTIVARIATE CALIBRATION; MASS-SPECTROMETRY; SERUM BIOMARKERS; STABILITY;
D O I
10.1515/sagmb-2012-0067
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In omics studies aimed at the early detection and diagnosis of cancer, bioinformatics tools play a significant role when analyzing high dimensional, complex datasets, as well as when identifying a small set of biomarkers. However, in many cases, there are ambiguities in the robustness and the consistency of the discovered biomarker sets, since the feature selection methods often lead to irreproducible results. To address this, both the stability and the classification power of several chemometrics-based feature selection algorithms were evaluated using the Monte Carlo sampling technique, aiming at finding the most suitable feature selection methods for early cancer detection and biomarker discovery. To this end, two data sets were analyzed, which comprised of MALDI-TOF-MS and LC/TOF-MS spectra measured on serum samples in order to diagnose ovarian cancer. Using these datasets, the stability and the classification power of multiple feature subsets found by different feature selection methods were quantified by varying either the number of selected features, or the number of samples in the training set, with special emphasis placed on the property of stability. The results show that high consistency does not necessarily guarantee high predictive power. In addition, differences in the stability, as well as agreement in feature lists between several feature selection methods, depend on several factors, such as the number of available samples, feature sizes, quality of the information in the dataset, etc. Among the tested methods, only the variable importance in projection (VIP)-based method shows complementary properties, providing both highly consistent and accurate subsets of features. In addition, successive projection analysis (SPA) was excellent with regards to maintaining high stability over a wide range of experimental conditions. The stability of several feature selection methods is highly variable, stressing the importance of making the proper choice among feature selection methods. Therefore, rather than evaluating the selected features using only classification accuracy, stability measurements should be examined as well to improve the reliability of biomarker discovery.
引用
收藏
页码:207 / 223
页数:17
相关论文
共 50 条
  • [1] A Comparative Study of Feature Selection Methods for Biomarker Discovery
    Mungloo-Dilmohamud, Zahra
    Marigliano, Gary
    Jaufeerally-Fakim, Yasmina
    Pena-Reyes, Carlos
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2789 - 2791
  • [2] Feature Selection Methods for Early Predictive Biomarker Discovery Using Untargeted Metabolomic Data
    Grissa, Dhouha
    Petera, Melanie
    Brandolini, Marion
    Napoli, Amedeo
    Comte, Blandine
    Pujos-Guillot, Estelle
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2016, 3
  • [3] Triple and quadruple optimization for feature selection in cancer biomarker discovery
    Cattelani, L.
    Fortino, V.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 159
  • [4] A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics
    Christin, Christin
    Hoefsloot, Huub C. J.
    Smilde, Age K.
    Hoekman, B.
    Suits, Frank
    Bischoff, Rainer
    Horvatovich, Peter
    MOLECULAR & CELLULAR PROTEOMICS, 2013, 12 (01) : 263 - 276
  • [5] Robust Biomarker Discovery for Cancer Diagnosis Based on Meta-Ensemble Feature Selection
    Boucheham, Anouar
    Batouche, Mohamed
    2014 SCIENCE AND INFORMATION CONFERENCE (SAI), 2014, : 452 - 460
  • [6] Chemometrics-Based Approach to Feature Selection of Chromatographic Profiles and its Application to Search Active Fraction of Herbal Medicine
    Chen, Chao
    Yuan, Jie
    Li, Xiao-Jie
    Shen, Zhi-Bin
    Yu, Dao-Hai
    Zhu, Jun-Fang
    Zeng, Fan-Lin
    CHEMICAL BIOLOGY & DRUG DESIGN, 2013, 81 (06) : 688 - 694
  • [7] Stable feature selection for biomarker discovery
    He, Zengyou
    Yu, Weichuan
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) : 215 - 225
  • [8] A Novel Approach for Feature Selection Based on MapReduce for Biomarker Discovery
    Kourid, Ahlem
    Batouche, Mohamed
    INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE ANALYSIS APPLICATIONS, 2015,
  • [9] Chemometrics-based signal processing methods for biosensors in health and environment: A review
    Wu, Wanqing
    Yang, Jianlei
    Zhou, Yu
    Zheng, Qinggong
    Chen, Qing
    Bai, Zhaoao
    Niu, Jiaqi
    ELECTROANALYSIS, 2024, 36 (07)
  • [10] A novel class dependent feature selection method for cancer biomarker discovery
    Zhou, Wengang
    Dickerson, Julie A.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 47 : 66 - 75