Support vector machine model of developmental brain gene expression data for prioritization of Autism risk gene candidates

被引:49
|
作者
Cogill, S. [1 ]
Wang, L. [1 ]
机构
[1] Clemson Univ, Dept Biochem & Genet, Clemson, SC 29634 USA
关键词
LONG NONCODING RNAS; SPECTRUM DISORDERS; PREDICTION; KNOWLEDGEBASE; IMPLICATE; EVOLUTION; CHILDREN; INSIGHTS; GENCODE; DNA;
D O I
10.1093/bioinformatics/btw498
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Autism spectrum disorders (ASD) are a group of neurodevelopmental disorders with clinical heterogeneity and a substantial polygenic component. High-throughput methods for ASD risk gene identification produce numerous candidate genes that are time-consuming and expensive to validate. Prioritization methods can identify high-confidence candidates. Previous ASD gene prioritization methods have focused on a priori knowledge, which excludes genes with little functional annotation or no protein product such as long non-coding RNAs (lncRNAs). Results: We have developed a support vector machine (SVM) model, trained using brain developmental gene expression data, for the classification and prioritization of ASD risk genes. The selected feature model had a mean accuracy of 76.7%, mean specificity of 77.2% and mean sensitivity of 74.4%. Gene lists comprised of an ASD risk gene and adjacent genes were ranked using the model's decision function output. The known ASD risk genes were ranked on average in the 77.4th, 78.4th and 80.7th percentile for sets of 101, 201 and 401 genes respectively. Of 10,840 lncRNA genes, 63 were classified as ASD-associated candidates with a confidence greater than 0.95. Genes previously associated with brain development and neurodevelopmental disorders were prioritized highly within the lncRNA gene list.
引用
收藏
页码:3611 / 3618
页数:8
相关论文
共 50 条
  • [41] Two-stage gene selection for support vector machine classification of microarray data
    Xia, Xiao-Lei
    Li, Kang
    Irwin, George W.
    INTERNATIONAL JOURNAL OF MODELLING IDENTIFICATION AND CONTROL, 2009, 8 (02) : 164 - 171
  • [42] Forecasting risk gene discovery in autism with machine learning and genome-scale data
    Brueggeman, Leo
    Koomar, Tanner
    Michaelson, Jacob J.
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [43] A Bayesian framework to integrate multi-level genome-scale data for Autism risk gene prioritization
    Ying Ji
    Rui Chen
    Quan Wang
    Qiang Wei
    Ran Tao
    Bingshan Li
    BMC Bioinformatics, 23
  • [44] A Bayesian framework to integrate multi-level genome-scale data for Autism risk gene prioritization
    Ji, Ying
    Chen, Rui
    Wang, Quan
    Wei, Qiang
    Tao, Ran
    Li, Bingshan
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [45] Covariance-Based Sample Selection for Heterogeneous Data: Applications to Gene Expression and Autism Risk Gene Detection
    Lin, Kevin Z.
    Liu, Han
    Roeder, Kathryn
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) : 54 - 67
  • [46] Vector Quantization of Microarray Gene Expression Data
    Prasad, T. V.
    Kohli, Maitrei
    WORLD CONGRESS ON ENGINEERING, WCE 2010, VOL I, 2010, : 231 - 235
  • [47] A HYBRID OF GENETIC ALGORITHM AND SUPPORT VECTOR MACHINE FOR FEATURES SELECTION AND CLASSIFICATION OF GENE EXPRESSION MICROARRAY
    Mohamad, Mohd Saberi
    Deris, Safaai
    Illias, Rosli Md
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2005, 5 (01) : 91 - 107
  • [48] Bayesian learning with local support vector machines for cancer classification with gene expression data
    Marchiori, E
    Sebag, M
    APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2005, 3449 : 74 - 83
  • [49] Classification of microarray gene expression data using a new binary support vector system
    Chen, TS
    Chen, RC
    Lin, CC
    Tsai, TH
    Li, SY
    Liang, X
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 485 - 489
  • [50] Gene Selection Based on Support Vector Machine using Bootstrap
    Song, Seuck Heun
    Kim, Kyoung Hee
    Park, Changyi
    Koo, Ja-Yong
    KOREAN JOURNAL OF APPLIED STATISTICS, 2007, 20 (03) : 531 - 540