Biomarker discovery from high-throughput data by connected network-constrained support vector machine

被引:4
|
作者
Li, Lingyu [1 ]
Liu, Zhi-Ping [1 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Network-constrained support vector machine; Biomarker discovery; Connectivity; Feature selection; High-throughput data; Breast cancer; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; GENE-EXPRESSION; R-PACKAGE; CLASSIFICATION; REGRESSION; NUMBER; LASSO;
D O I
10.1016/j.eswa.2023.120179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From a systems biology perspective, genes usually work collaboratively in the form of a network, e.g., cancer -related genes participate in an integrative dysfunctional pathway. Thus, feature gene selection considering the graph or network structure plays a crucial role in cancer biomarker discovery from high-throughput omics data. The network-based paradigm demonstrates that integrating gene expression data with gene networks can improve classification performances and generate more interpretable feature subsets. In this paper, we propose an embedded connected network-constrained support vector machine (CNet-SVM) method to keep the selected features in an inherent graph structure in discovering biomarker genes. Firstly, we mathematically formulate the CNet-SVM model as a convex optimization problem constrained by network connectivity inequalities and theoretically investigate the behaviors of all tuning parameters to provide search guidance on the regularization path. Secondly, to check if the genes selected by CNet-SVM could be studied as network-structured biomarkers, we conduct experiments on several simulation datasets and real-world breast cancer (BRCA) datasets to validate its classification and prediction capabilities. The results show that CNet-SVM not only maintains the sparsity and smoothness, but also considers the connectivity constraints between genes when selecting features on a prior gene-gene interaction network from omics data. Especially, CNet-SVM identifies 32 BRCA biomarker genes, which form into a connected network component and can be potentially used for BRCA diagnosis. Furthermore, the comparisons with eight feature selection-empowered SVM methods demonstrate that the easily interpretable networked feature genes discovered by CNet-SVM are more closely related to BRCA dysfunctions. Finally, we validate that the identified biomarkers achieve high prediction accuracy on external independent cohorts. All results proved that the proposed CNet-SVM method is effective in selecting connected-network-structured features and can be an alternative improvement to the current SVM models for biomarker identification from high-throughput data. The data and code are available at https://github.com/zpliulab/CNet-SVM.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Network-constrained Support Vector Machine for Classification
    Chen, Li
    Xuan, Jianhua
    Wang, Yue
    Riggins, Rebecca B.
    Clarke, Robert
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 60 - +
  • [2] High-Throughput Screening for Biomarker Discovery
    Janvilisri, Tavan
    Suzuki, Haruo
    Scaria, Joy
    Chen, Jenn-Wei
    Charoensawan, Varodom
    DISEASE MARKERS, 2015, 2015
  • [3] Identifying cancer biomarkers by network-constrained support vector machines
    Chen, Li
    Xuan, Jianhua
    Riggins, Rebecca B.
    Clarke, Robert
    Wang, Yue
    BMC SYSTEMS BIOLOGY, 2011, 5
  • [4] High-Throughput Shape Classification Using Support Vector Machine
    Lad, Pranav
    Somani, Abhijeet
    Krishnan, K. Eswara
    Gupta, Abhishek
    Kartik, V.
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2016, : 854 - 859
  • [5] Feature selection method based on support vector machine and shape analysis for high-throughput medical data
    Liu, Qiong
    Gu, Qiong
    Wu, Zhao
    COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 91 : 103 - 111
  • [6] High-Throughput Transcriptome Profiling in Drug and Biomarker Discovery
    Yang, Xiaonan
    Kui, Ling
    Tang, Min
    Li, Dawei
    Wei, Kunhua
    Chen, Wei
    Miao, Jianhua
    Dong, Yang
    FRONTIERS IN GENETICS, 2020, 11
  • [7] High-throughput biomarker discovery and identification by mass spectrometry
    Menzel, C
    Guillou, V
    Kellmann, M
    Khamenya, V
    Juergens, M
    Schulz-Knappe, P
    COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING, 2005, 8 (08) : 743 - 755
  • [8] High-throughput proteomics and AI for cancer biomarker discovery
    Xiao, Qi
    Zhang, Fangfei
    Xu, Luang
    Yue, Liang
    Kon, Oi Lian
    Zhu, Yi
    Guo, Tiannan
    ADVANCED DRUG DELIVERY REVIEWS, 2021, 176
  • [9] Cancer biomarker discovery for cholangiocarcinoma: the high-throughput approaches
    Silsirivanit, Atit
    Sawanyawisuth, Kanlayanee
    Riggins, Gregory J.
    Wongkham, Chaisiri
    JOURNAL OF HEPATO-BILIARY-PANCREATIC SCIENCES, 2014, 21 (06) : 388 - 396
  • [10] Robust biomarker discovery for hepatocellular carcinoma from high-throughput data by multiple feature selection methods
    Zhang, Zishuang
    Liu, Zhi-Ping
    BMC MEDICAL GENOMICS, 2021, 14 (SUPPL 1)