Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods

被引:17
|
作者
Zhang, Zishuang [1 ]
Liu, Zhi-Ping [1 ,2 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Dept Biomed Engn, Jinan 250061, Shandong, Peoples R China
[2] Shandong Univ, Ctr Intelligent Med, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Biomarker discovery; Omics data; Feature selection; Akaike information criterion; Hepatocellular carcinoma; IDENTIFICATION; DISEASES;
D O I
10.1186/s12920-021-00957-4
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Hepatocellular carcinoma (HCC) is one of the most common cancers. The discovery of specific genes severing as biomarkers is of paramount significance for cancer diagnosis and prognosis. The high-throughput omics data generated by the cancer genome atlas (TCGA) consortium provides a valuable resource for the discovery of HCC biomarker genes. Numerous methods have been proposed to select cancer biomarkers. However, these methods have not investigated the robustness of identification with different feature selection techniques. Methods We use six different recursive feature elimination methods to select the gene signiatures of HCC from TCGA liver cancer data. The genes shared in the six selected subsets are proposed as robust biomarkers. Akaike information criterion (AIC) is employed to explain the optimization process of feature selection, which provides a statistical interpretation for the feature selection in machine learning methods. And we use several methods to validate the screened biomarkers. Results In this paper, we propose a robust method for discovering biomarker genes for HCC from gene expression data. Specifically, we implement recursive feature elimination cross-validation (RFE-CV) methods based on six different classication algorithms. The overlaps in the discovered gene sets via different methods are referred as the identified biomarkers. We give an interpretation of the feature selection process based on machine learning using AIC in statistics. Furthermore, the features selected by the backward logistic stepwise regression via AIC minimum theory are completely contained in the identified biomarkers. Through the classification results, the superiority of interpretable robust biomarker discovery method is verified. Conclusions It is found that overlaps among gene subsets contain different quantitative features selected by the RFE-CV of 6 classifiers. The AIC values in the model selection provide a theoretical foundation for the feature selection process of biomarker discovery via machine learning. What's more, genes containing in more optimally selected subsets make better biological sense and implication. The quality of feature selection is improved by the intersections of biomarkers selected from different classifiers. This is a general method suitable for screening biomarkers of complex diseases from high-throughput data.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] High-throughput proteomics for suicide biomarker discovery in patients with bipolar disorder
    Sandberg, J. V.
    EUROPEAN NEUROPSYCHOPHARMACOLOGY, 2020, 40 : S418 - S418
  • [32] MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data
    Murillo, Gabriel H.
    You, Na
    Su, Xiaoquan
    Cui, Wei
    Reilly, Muredach P.
    Li, Mingyao
    Ning, Kang
    Cui, Xinping
    BIOINFORMATICS, 2016, 32 (10) : 1486 - 1492
  • [33] A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics
    Christin, Christin
    Hoefsloot, Huub C. J.
    Smilde, Age K.
    Hoekman, B.
    Suits, Frank
    Bischoff, Rainer
    Horvatovich, Peter
    MOLECULAR & CELLULAR PROTEOMICS, 2013, 12 (01) : 263 - 276
  • [34] Identifying Biomarkers of Hepatocellular Carcinoma Based on Gene Co-Expression Network from High-Throughput Data
    Zhang, Ying
    Liu, Zhiping
    Li, Jing-song
    MEDINFO 2017: PRECISION HEALTHCARE THROUGH INFORMATICS, 2017, 245 : 667 - 671
  • [35] Choice of High-Throughput Proteomics Method Affects Data Integration with Transcriptomics and the Potential Use in Biomarker Discovery
    Junior, Sergio Mosquim
    Siino, Valentina
    Ryden, Lisa
    Vallon-Christersson, Johan
    Levander, Fredrik
    CANCERS, 2022, 14 (23)
  • [36] High-throughput discovery of metal oxides with high thermoelectric performance via interpretable feature engineering on small data
    Ma, Shengluo
    Rao, Yongchao
    Huang, Xiang
    Ju, Shenghong
    MATERIALS TODAY PHYSICS, 2024, 45
  • [37] A high-throughput test enables specific detection of hepatocellular carcinoma
    Cheishvili, David
    Wong, Chifat
    Karim, Mohammad Mahbubul
    Kibria, Mohammad Golam
    Jahan, Nusrat
    Das, Pappu Chandra
    Yousuf, Md. Abul Khair
    Islam, Md. Atikul
    Das, Dulal Chandra
    Noor-E-Alam, Sheikh Mohammad
    Szyf, Moshe
    Alam, Sarwar
    Khan, Wasif A.
    Al Mahtab, Mamun
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [38] A high-throughput test enables specific detection of hepatocellular carcinoma
    David Cheishvili
    Chifat Wong
    Mohammad Mahbubul Karim
    Mohammad Golam Kibria
    Nusrat Jahan
    Pappu Chandra Das
    Md. Abul Khair Yousuf
    Md. Atikul Islam
    Dulal Chandra Das
    Sheikh Mohammad Noor-E-Alam
    Moshe Szyf
    Sarwar Alam
    Wasif A. Khan
    Mamun Al Mahtab
    Nature Communications, 14
  • [39] A Bayesian method for biological pathway discovery from high-throughput experimental data
    Wang, W
    Cooper, GF
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 645 - 646
  • [40] Editorial: The protagonism of bioanalytical methods in high-throughput drug discovery
    de Moraes, Marcela Cristina
    de Almeida, Fernando Goncalves
    Tinoco, Luzineide Wanderley
    FRONTIERS IN ANALYTICAL SCIENCE, 2023, 3