Automatic annotation of protein motif function with Gene Ontology terms

被引:23
|
作者
Lu, XH
Zhai, CX
Gopalakrishnan, V
Buchanan, BG
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Charleston, SC 29425 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Univ Pittsburgh, Ctr Biomed Informat, Pittsburgh, PA 15213 USA
关键词
D O I
10.1186/1471-2105-5-122
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology ( GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results: This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions: In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Automatic annotation of protein motif function with Gene Ontology terms
    Xinghua Lu
    Chengxiang Zhai
    Vanathi Gopalakrishnan
    Bruce G Buchanan
    BMC Bioinformatics, 5
  • [2] On gene ontology and function annotation
    Pal, Debnath
    BIOINFORMATION, 2006, 1 (03) : 97 - 98
  • [3] Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
    Nikolai Daraselia
    Anton Yuryev
    Sergei Egorov
    Ilya Mazo
    Iaroslav Ispolatov
    BMC Bioinformatics, 8
  • [4] Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks
    Daraselia, Nikolai
    Yuryev, Anton
    Egorov, Sergei
    Mazo, Ilya
    Ispolatov, Iaroslav
    BMC BIOINFORMATICS, 2007, 8
  • [5] Automatic annotation of protein function
    Valencia, A
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2005, 15 (03) : 267 - 274
  • [6] Using reasoning to guide annotation with gene, ontology terms in GOAT
    Bada, N
    Turi, D
    McEntire, R
    Stevens, R
    SIGMOD RECORD, 2004, 33 (02) : 27 - 32
  • [7] Association Rule Mining of Gene Ontology Annotation Terms for SGD
    Nagar, Anurag
    Hahsler, Michael
    Al-Mubaid, Hisham
    2015 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2015, : 458 - 464
  • [8] Annotation of gene products in the literature with gene ontology terms using syntactic dependencies
    Kim, JJ
    Park, JC
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 787 - 796
  • [9] GOblet: Annotation of anonymous sequence data with Gene Ontology and Pathway terms
    Groth, Detlef
    Hartmann, Stefanie
    Panopoulou, Georgia
    Poustka, Albert J.
    Hennig, Steffen
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2008, 5 (02):
  • [10] Protein annotation from protein interaction networks and Gene Ontology
    Nguyen, Cao D.
    Gardiner, Katheleen J.
    Cios, Krzysztof J.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) : 824 - 829