Automatic annotation of protein motif function with Gene Ontology terms

被引:23
|
作者
Lu, XH
Zhai, CX
Gopalakrishnan, V
Buchanan, BG
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Charleston, SC 29425 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Univ Pittsburgh, Ctr Biomed Informat, Pittsburgh, PA 15213 USA
关键词
D O I
10.1186/1471-2105-5-122
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology ( GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results: This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions: In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Protein function classification based on gene ontology
    Park, DW
    Heo, HS
    Kwon, HC
    Chung, HY
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 691 - 696
  • [32] GoFigure:: Automated gene Ontology™ annotation
    Khan, S
    Situ, G
    Decker, K
    Schmidt, CJ
    BIOINFORMATICS, 2003, 19 (18) : 2484 - 2485
  • [33] The Renal Gene Ontology Annotation Initiative
    Alam-Faruque, Yasmin
    Dimmer, Emily C.
    Huntley, Rachael P.
    O'Donovan, Claire
    Scambler, Peter
    Apweiler, Rolf
    ORGANOGENESIS, 2010, 6 (02) : 71 - 75
  • [34] Annotation of gene product function from high-throughput studies using the Gene Ontology
    Attrill, Helen
    Gaudet, Pascale
    Huntley, Rachael P.
    Lovering, Ruth C.
    Engel, Stacia R.
    Poux, Sylvain
    Van Auken, Kimberly M.
    Georghiou, George
    Chibucos, Marcus C.
    Berardini, Tanya Z.
    Wood, Valerie
    Drabkin, Harold
    Fey, Petra
    Garmiri, Penelope
    Harris, Midori A.
    Sawford, Tony
    Reiser, Leonore
    Tauber, Rebecca
    Toro, Sabrina
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2019,
  • [35] Information theory applied to the sparse gene ontology annotation network to predict novel gene function
    Tao, Ying
    Sam, Lee
    Li, Jianrong
    Friedman, Carol
    Lussier, Yves A.
    BIOINFORMATICS, 2007, 23 (13) : I529 - I538
  • [36] Protein annotation as term categorization in the gene ontology using word proximity networks
    Karin Verspoor
    Judith Cohn
    Cliff Joslyn
    Sue Mniszewski
    Andreas Rechtsteiner
    Luis M Rocha
    Tiago Simas
    BMC Bioinformatics, 6 (Suppl 1)
  • [37] Cluster analysis of protein array results via similarity of Gene Ontology annotation
    Cheryl Wolting
    C Jane McGlade
    David Tritchler
    BMC Bioinformatics, 7
  • [38] Cluster analysis of protein array results via similarity of Gene Ontology annotation
    Wolting, Cheryl
    McGlade, C. Jane
    Tritchler, David
    BMC BIOINFORMATICS, 2006, 7 (1)
  • [39] GOASVM: PROTEIN SUBCELLULAR LOCALIZATION PREDICTION BASED ON GENE ONTOLOGY ANNOTATION AND SVM
    Wan, Shibiao
    Mak, Man-Wai
    Kung, Sun-Yuan
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2229 - 2232
  • [40] Protein annotation as term categorization in the gene ontology using word proximity networks
    Verspoor, K
    Cohn, J
    Joslyn, C
    Mniszewski, S
    Rechtsteiner, A
    Rocha, LM
    Simas, T
    BMC BIOINFORMATICS, 2005, 6