Predicting Gene Ontology functions based on support vector machines and statistical significance estimation

被引:12
|
作者
Bi, Ran [1 ]
Zhou, Yanhong [1 ]
Lu, Feng [1 ]
Wang, Weiqiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
protein function; Gene Ontology; support vector machines; statistical significance;
D O I
10.1016/j.neucom.2006.10.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene Ontology (GO) is a common language for the functional annotation of gene products. We have developed a computational tool, GOKey, to predict the GO function of proteins based on their sequence features and the support vector machine (SVM) method. Several measures, including improved handling of the problem caused by unbalanced positive and negative training data and postprocessing strategies to evaluate the posterior probability and statistical significance of SVM outputs, have been adopted to improve the prediction performance of GOKey. The GOKey has been trained to predict the 36 GO categories of the 'molecular function' of GO slims, and could be easily extended to other GO categories. The results of 5-fold cross validation with 10,603 GO-mapped proteins demonstrate that the performance of GOKey is better than that of standard SVMs. Comparisons with other computational tools for GO function prediction also show that the performance of GOKey is satisfactory. Further, GOKey has been applied to predict the GO functions for 5381 novel human proteins in the Ensembl database. The results show that 93% of the novel proteins can be assigned one or more GO terms, and some evidences supporting the predictions have been found. GOKey can be accessed at http://infosci.hust.edu.cn. (c) 2006 Published by Elsevier B.V.
引用
收藏
页码:718 / 725
页数:8
相关论文
共 50 条
  • [41] Support Vector Machines for predicting protein structural class
    Cai, Yu-Dong
    Liu, Xiao-Jun
    Xu, Xue-biao
    Zhou, Guo-Ping
    BMC BIOINFORMATICS, 2001, 2 (1)
  • [42] An ensemble of support vector machines for predicting virulent proteins
    Nanni, Loris
    Lumini, Alessandra
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) : 7458 - 7462
  • [43] Support Vector Machines for predicting protein structural class
    Yu-Dong Cai
    Xiao-Jun Liu
    Xue-biao Xu
    Guo-Ping Zhou
    BMC Bioinformatics, 2
  • [44] Online Signature Verification With Support Vector Machines Based on LCSS Kernel Functions
    Gruber, Christian
    Gruber, Thiemo
    Krinninger, Sebastian
    Sick, Bernhard
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2010, 40 (04): : 1088 - 1100
  • [45] Comparison of Binary Classification Based on Signed Distance Functions with Support Vector Machines
    Boczko, Erik M.
    Xie, Minhui
    Wu, Di
    Young, Todd
    2009 OHIO COLLABORATIVE CONFERENCE ON BIOINFORMATICS, PROCEEDINGS, 2009, : 139 - +
  • [46] Research on polynomial functions for smoothing support vector machines
    Liu, Ye-Qing
    Liu, San-Yang
    Gu, Ming-Tao
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2009, 31 (06): : 1450 - 1453
  • [47] A Reformulation of Support Vector Machines for General Confidence Functions
    Guo, Yuhong
    Schuurmans, Dale
    ADVANCES IN MACHINE LEARNING, PROCEEDINGS, 2009, 5828 : 109 - +
  • [48] Support vector machines on the space of Walsh functions and their properties
    Fazekas, A
    Kotropoulos, C
    Buciu, I
    Pitas, I
    ISPA 2001: PROCEEDINGS OF THE 2ND INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS, 2001, : 43 - 48
  • [49] Knowledge based support vector machines
    Fukunaga, T
    Ibaraki, T
    ENGINEERING INTELLIGENT SYSTEMS FOR ELECTRICAL ENGINEERING AND COMMUNICATIONS, 2005, 13 (04): : 259 - 267
  • [50] Road junction background reconstruction based on median estimation and support vector machines
    Liu, Shuan
    Dong, Jun-Yu
    Wang, Sheng-Ke
    Chen, Guo-Jiang
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 4200 - +