Predicting Gene Ontology functions based on support vector machines and statistical significance estimation

被引:12
|
作者
Bi, Ran [1 ]
Zhou, Yanhong [1 ]
Lu, Feng [1 ]
Wang, Weiqiang [1 ]
机构
[1] Huazhong Univ Sci & Technol, Hubei Bioinformat & Mol Imaging Key Lab, Wuhan 430074, Hubei, Peoples R China
基金
中国国家自然科学基金;
关键词
protein function; Gene Ontology; support vector machines; statistical significance;
D O I
10.1016/j.neucom.2006.10.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gene Ontology (GO) is a common language for the functional annotation of gene products. We have developed a computational tool, GOKey, to predict the GO function of proteins based on their sequence features and the support vector machine (SVM) method. Several measures, including improved handling of the problem caused by unbalanced positive and negative training data and postprocessing strategies to evaluate the posterior probability and statistical significance of SVM outputs, have been adopted to improve the prediction performance of GOKey. The GOKey has been trained to predict the 36 GO categories of the 'molecular function' of GO slims, and could be easily extended to other GO categories. The results of 5-fold cross validation with 10,603 GO-mapped proteins demonstrate that the performance of GOKey is better than that of standard SVMs. Comparisons with other computational tools for GO function prediction also show that the performance of GOKey is satisfactory. Further, GOKey has been applied to predict the GO functions for 5381 novel human proteins in the Ensembl database. The results show that 93% of the novel proteins can be assigned one or more GO terms, and some evidences supporting the predictions have been found. GOKey can be accessed at http://infosci.hust.edu.cn. (c) 2006 Published by Elsevier B.V.
引用
收藏
页码:718 / 725
页数:8
相关论文
共 50 条
  • [1] Applying Support Vector Machines for Gene ontology based gene function prediction
    Arunachalam Vinayagam
    Rainer König
    Jutta Moormann
    Falk Schubert
    Roland Eils
    Karl-Heinz Glatting
    Sándor Suhai
    BMC Bioinformatics, 5
  • [2] Applying support vector machines for gene ontology based gene function prediction -: art. no. 116
    Vinayagam, A
    König, R
    Moormann, J
    Schubert, F
    Eils, R
    Glatting, KH
    Suhai, S
    BMC BIOINFORMATICS, 2004, 5 (1)
  • [3] Estimation of sand liquefaction based on support vector machines
    Su Yong-hua
    Ma Ning
    Hu Jian
    Yang Xiao-li
    JOURNAL OF CENTRAL SOUTH UNIVERSITY OF TECHNOLOGY, 2008, 15 (Suppl 2): : 15 - 20
  • [4] Estimation of sand liquefaction based on support vector machines
    Yong-hua Su
    Ning Ma
    Jian Hu
    Xiao-li Yang
    Journal of Central South University of Technology, 2008, 15 : 15 - 20
  • [5] mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    BMC BIOINFORMATICS, 2012, 13
  • [6] mGOASVM: Multi-label protein subcellular localization based on gene ontology and support vector machines
    Shibiao Wan
    Man-Wai Mak
    Sun-Yuan Kung
    BMC Bioinformatics, 13
  • [7] Statistical performance of support vector machines
    Blanchard, Gilles
    Bousquet, Olivier
    Massart, Pascal
    ANNALS OF STATISTICS, 2008, 36 (02): : 489 - 531
  • [8] Significance-based support vector machines for incomplete data
    Lee, KiYoung
    Lee, Kwang H.
    Lee, Doheon
    Proceedings of the Fourth International Conference on Information and Management Sciences, 2005, 4 : 299 - 304
  • [9] Fuzzy functions with support vector machines
    Celikyilmaz, Asli
    Tuerksen, I. Burhan
    INFORMATION SCIENCES, 2007, 177 (23) : 5163 - 5177
  • [10] Conditional density estimation with HMM based support vector machines
    Hu, Fasheng
    Liu, Zhenqiu
    Jia, Chunxin
    Chen, Dechang
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2007, 4682 : 1245 - 1254