Biomarker discovery from high-throughput data by connected network-constrained support vector machine

被引:4
|
作者
Li, Lingyu [1 ]
Liu, Zhi-Ping [1 ]
机构
[1] Shandong Univ, Sch Control Sci & Engn, Jinan 250061, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Network-constrained support vector machine; Biomarker discovery; Connectivity; Feature selection; High-throughput data; Breast cancer; NONCONCAVE PENALIZED LIKELIHOOD; VARIABLE SELECTION; GENE-EXPRESSION; R-PACKAGE; CLASSIFICATION; REGRESSION; NUMBER; LASSO;
D O I
10.1016/j.eswa.2023.120179
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
From a systems biology perspective, genes usually work collaboratively in the form of a network, e.g., cancer -related genes participate in an integrative dysfunctional pathway. Thus, feature gene selection considering the graph or network structure plays a crucial role in cancer biomarker discovery from high-throughput omics data. The network-based paradigm demonstrates that integrating gene expression data with gene networks can improve classification performances and generate more interpretable feature subsets. In this paper, we propose an embedded connected network-constrained support vector machine (CNet-SVM) method to keep the selected features in an inherent graph structure in discovering biomarker genes. Firstly, we mathematically formulate the CNet-SVM model as a convex optimization problem constrained by network connectivity inequalities and theoretically investigate the behaviors of all tuning parameters to provide search guidance on the regularization path. Secondly, to check if the genes selected by CNet-SVM could be studied as network-structured biomarkers, we conduct experiments on several simulation datasets and real-world breast cancer (BRCA) datasets to validate its classification and prediction capabilities. The results show that CNet-SVM not only maintains the sparsity and smoothness, but also considers the connectivity constraints between genes when selecting features on a prior gene-gene interaction network from omics data. Especially, CNet-SVM identifies 32 BRCA biomarker genes, which form into a connected network component and can be potentially used for BRCA diagnosis. Furthermore, the comparisons with eight feature selection-empowered SVM methods demonstrate that the easily interpretable networked feature genes discovered by CNet-SVM are more closely related to BRCA dysfunctions. Finally, we validate that the identified biomarkers achieve high prediction accuracy on external independent cohorts. All results proved that the proposed CNet-SVM method is effective in selecting connected-network-structured features and can be an alternative improvement to the current SVM models for biomarker identification from high-throughput data. The data and code are available at https://github.com/zpliulab/CNet-SVM.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Computational Discovery of Transition-metal Complexes: From High-throughput Screening to Machine Learning
    Nandy, Aditya
    Duan, Chenru
    Taylor, Michael G.
    Liu, Fang
    Steeves, Adam H.
    Kulik, Heather J.
    CHEMICAL REVIEWS, 2021, 121 (16) : 9927 - 10000
  • [42] ExpoSeq: simplified analysis of high-throughput sequencing data from antibody discovery campaigns
    Sorensen, Christoffer, V
    Hofmann, Nils
    Rawat, Puneet
    Sorensen, Frederik, V
    Ljungars, Anne
    Greiff, Victor
    Laustsen, Andreas H.
    Jenkins, Timothy P.
    BIOINFORMATICS ADVANCES, 2024, 4 (01):
  • [43] A neural network approach to multi-biomarker panel discovery by high-throughput plasma proteomics profiling of breast cancer
    Fan Zhang
    Jake Chen
    Mu Wang
    Renee Drabier
    BMC Proceedings, 7 (Suppl 7)
  • [45] Discovery of Graphene Growth Alloy Catalysts Using High-Throughput Machine Learning
    Li, Xinyu
    Shi, Javen Qinfeng
    Page, Alister J.
    NANO LETTERS, 2023, 23 (21) : 9796 - 9802
  • [46] Machine learning and high-throughput quantum chemistry methods for the discovery of organic materials
    Aspuru-Guzik, Alan
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251
  • [47] A high-throughput drug discovery pipeline to optimize kidney normothermic machine perfusion
    Hofmann, Smilla
    Grahammer, Florian
    Edenhofer, Ilka
    Puelles, Victor G.
    Huber, Tobias B.
    Czogalla, Jan
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [48] Forecasting Research on the Wireless Mesh Network Throughput Based on the Support Vector Machine
    Yan Feng
    Xingxing Wu
    Yaoke Hu
    Wireless Personal Communications, 2018, 99 : 581 - 593
  • [49] Forecasting Research on the Wireless Mesh Network Throughput Based on the Support Vector Machine
    Feng, Yan
    Wu, Xingxing
    Hu, Yaoke
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 99 (01) : 581 - 593
  • [50] A high-throughput SNP discovery strategy for RNA-seq data
    Zhao, Yun
    Wang, Ke
    Wang, Wen-li
    Yin, Ting-ting
    Dong, Wei-qi
    Xu, Chang-jie
    BMC GENOMICS, 2019, 20 (1)