Biomarker discovery from high-throughput data by connected network-constrained support vector machine

被引:4
|
作者
Li, Lingyu [1 ]
Liu, Zhi-Ping [1 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Network-constrained support vector machine; Biomarker discovery; Connectivity; Feature selection; High-throughput data; Breast cancer; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; GENE-EXPRESSION; R-PACKAGE; CLASSIFICATION; REGRESSION; NUMBER; LASSO;
D O I
10.1016/j.eswa.2023.120179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From a systems biology perspective, genes usually work collaboratively in the form of a network, e.g., cancer -related genes participate in an integrative dysfunctional pathway. Thus, feature gene selection considering the graph or network structure plays a crucial role in cancer biomarker discovery from high-throughput omics data. The network-based paradigm demonstrates that integrating gene expression data with gene networks can improve classification performances and generate more interpretable feature subsets. In this paper, we propose an embedded connected network-constrained support vector machine (CNet-SVM) method to keep the selected features in an inherent graph structure in discovering biomarker genes. Firstly, we mathematically formulate the CNet-SVM model as a convex optimization problem constrained by network connectivity inequalities and theoretically investigate the behaviors of all tuning parameters to provide search guidance on the regularization path. Secondly, to check if the genes selected by CNet-SVM could be studied as network-structured biomarkers, we conduct experiments on several simulation datasets and real-world breast cancer (BRCA) datasets to validate its classification and prediction capabilities. The results show that CNet-SVM not only maintains the sparsity and smoothness, but also considers the connectivity constraints between genes when selecting features on a prior gene-gene interaction network from omics data. Especially, CNet-SVM identifies 32 BRCA biomarker genes, which form into a connected network component and can be potentially used for BRCA diagnosis. Furthermore, the comparisons with eight feature selection-empowered SVM methods demonstrate that the easily interpretable networked feature genes discovered by CNet-SVM are more closely related to BRCA dysfunctions. Finally, we validate that the identified biomarkers achieve high prediction accuracy on external independent cohorts. All results proved that the proposed CNet-SVM method is effective in selecting connected-network-structured features and can be an alternative improvement to the current SVM models for biomarker identification from high-throughput data. The data and code are available at https://github.com/zpliulab/CNet-SVM.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] High-throughput and machine learning approaches for the discovery of metal organic frameworks
    Zhang, Xiangyu
    Xu, Zezhao
    Wang, Zidi
    Liu, Huiyu
    Zhao, Yingbo
    Jiang, Shan
    APL MATERIALS, 2023, 11 (06)
  • [32] Enabling Catalyst Discovery through Machine Learning and High-Throughput Experimentation
    Williams, Travis
    McCullough, Katherine
    Lauterbach, Jochen A.
    CHEMISTRY OF MATERIALS, 2020, 32 (01) : 157 - 165
  • [33] Discovery of New Plasmonic Metals via High-Throughput Machine Learning
    Shapera, Ethan P.
    Schleife, Andre
    ADVANCED OPTICAL MATERIALS, 2022, 10 (18)
  • [34] Biomarker discovery using dry-lab technologies and high-throughput screening
    Chang, Hao-Teng
    BIOMARKERS IN MEDICINE, 2016, 10 (06) : 559 - 561
  • [35] High-Throughput Tear Proteomics via In-Capillary Digestion for Biomarker Discovery
    Xiao, James
    Frenia, Kyla
    Garwood, Kathleen C.
    Kimmel, Jeremy
    Labriola, Leanne T.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2024, 25 (22)
  • [36] Data processing for high-throughput mass spectrometry in drug discovery
    Liu, Chang
    Zhang, Hui
    EXPERT OPINION ON DRUG DISCOVERY, 2024, 19 (07) : 815 - 825
  • [37] High-Throughput Discovery of Synthetic Surfaces That Support Proliferation of Pluripotent Cells
    Derda, Ratmir
    Musah, Samira
    Orner, Brendan P.
    Klim, Joseph R.
    Li, Lingyin
    Kiessling, Laura L.
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2010, 132 (04) : 1289 - 1295
  • [38] High-throughput screening in drug metabolism and pharmacokinetic support of drug discovery
    White, RE
    ANNUAL REVIEW OF PHARMACOLOGY AND TOXICOLOGY, 2000, 40 : 133 - 157
  • [39] Interval support vector regression enables high-throughput machine learning predictions for dielectric constant of polymer dielectrics
    Yi, Y.
    Wang, L. M.
    Yin, F. H.
    APPLIED PHYSICS LETTERS, 2021, 118 (22)
  • [40] A general approach for discriminative de novo motif discovery from high-throughput data
    Grau, Jan
    Posch, Stefan
    Grosse, Ivo
    Keilwagen, Jens
    NUCLEIC ACIDS RESEARCH, 2013, 41 (21)