Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

被引:17
|
作者
Zhang, Zishuang [1 ]
Liu, Zhi-Ping [1 ,2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Dept Biomed Engn, Jinan 250061, Shandong, Peoples R China
[2] Shandong Univ, Ctr Intelligent Med, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomarker discovery; Omics data; Feature selection; Akaike information criterion; Hepatocellular carcinoma; IDENTIFICATION; DISEASES;
D O I
10.1186/s12920-021-00957-4
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] High-throughput methods for discovery and optimization of porous crystalline materials
    Stock, Norbert
    CHEMIE INGENIEUR TECHNIK, 2010, 82 (07) : 1039 - 1047
  • [42] Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
    Abeel, Thomas
    Helleputte, Thibault
    Van de Peer, Yves
    Dupont, Pierre
    Saeys, Yvan
    BIOINFORMATICS, 2010, 26 (03) : 392 - 398
  • [43] Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data
    Pan, Yonglong
    Wang, Xiaoming
    Liu, Lin
    Wang, Hao
    Luo, Meizhong
    PLOS ONE, 2016, 11 (09):
  • [44] Biomarker Discovery for Early Diagnosis of Papillary Thyroid Carcinoma Using High-Throughput Enhanced Quantitative Plasma Proteomics
    Lu, Hongsheng
    Pan, Yin
    Ruan, Yanyun
    Zhu, Chumeng
    Hassan, Hozeifa M.
    Gao, Junshun
    Gao, Junli
    Fan, Lilong
    Liang, Xi
    Wang, Hong
    Ying, Shenpeng
    Chen, Qi
    JOURNAL OF PROTEOME RESEARCH, 2023, 22 (10) : 3200 - 3212
  • [45] A High-Throughput Sequencing Data-Based Classifier Reveals the Metabolic Heterogeneity of Hepatocellular Carcinoma
    Ye, Maolin
    Li, Xuewei
    Chen, Lirong
    Mo, Shaocong
    Liu, Jie
    Huang, Tiansheng
    Luo, Feifei
    Zhang, Jun
    CANCERS, 2023, 15 (03)
  • [46] Towards proteome standards: The use of absolute quantitation in high-throughput biomarker discovery
    Chao, Tzu-Chiao
    Hansmeier, Nicole
    Halden, Rolf U.
    JOURNAL OF PROTEOMICS, 2010, 73 (08) : 1641 - 1646
  • [47] Robust Selection of Cancer Survival Signatures from High-Throughput Genomic Data Using Two-Fold Subsampling
    Lee, Sangkyun
    Rahnenfuehrer, Joerg
    Lang, Michel
    De Preter, Katleen
    Mestdagh, Pieter
    Koster, Jan
    Versteeg, Rogier
    Stallings, Raymond L.
    Varesio, Luigi
    Asgharzadeh, Shahab
    Schulte, Johannes H.
    Fielitz, Kathrin
    Schwermer, Melanie
    Morik, Katharina
    Schramm, Alexander
    PLOS ONE, 2014, 9 (10):
  • [48] Stability of Feature Selection Algorithms for Classification in High-Throughput Genomics Datasets
    Moulos, Panagiotis
    Kanaris, Ioannis
    Bontempi, Gianluca
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2013,
  • [49] Biomarker discovery using dry-lab technologies and high-throughput screening
    Chang, Hao-Teng
    BIOMARKERS IN MEDICINE, 2016, 10 (06) : 559 - 561
  • [50] High-Throughput Tear Proteomics via In-Capillary Digestion for Biomarker Discovery
    Xiao, James
    Frenia, Kyla
    Garwood, Kathleen C.
    Kimmel, Jeremy
    Labriola, Leanne T.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (22)