Automatic annotation of protein motif function with Gene Ontology terms

被引:23
|
作者
Lu, XH
Zhai, CX
Gopalakrishnan, V
Buchanan, BG
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Charleston, SC 29425 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[3] Univ Pittsburgh, Ctr Biomed Informat, Pittsburgh, PA 15213 USA
关键词
D O I
10.1186/1471-2105-5-122
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Conserved protein sequence motifs are short stretches of amino acid sequence patterns that potentially encode the function of proteins. Several sequence pattern searching algorithms and programs exist foridentifying candidate protein motifs at the whole genome level. However, amuch needed and importanttask is to determine the functions of the newly identified protein motifs. The Gene Ontology ( GO) project is an endeavor to annotate the function of genes or protein sequences with terms from a dynamic, controlled vocabulary and these annotations serve well as a knowledge base. Results: This paperpresents methods to mine the GO knowledge base and use the association between the GO terms assigned to a sequence and the motifs matched by the same sequence as evidence for predicting the functions of novel protein motifs automatically. The task of assigning GO terms to protein motifsis viewed as both a binary classification and information retrieval problem, where PROSITE motifs are used as samples for mode training and functional prediction. The mutual information of a motif and aGO term association isfound to be a very useful feature. We take advantageof the known motifs to train a logistic regression classifier, which allows us to combine mutual information with other frequency-based features and obtain a probability of correctassociation. The trained logistic regression model has intuitively meaningful and logically plausible parameter values, and performs very well empirically according to our evaluation criteria. Conclusions: In this research, different methods for automatic annotation of protein motifs have been investigated. Empirical result demonstrated that the methods have a great potential for detecting and augmenting information about thefunctions of newly discovered candidate protein motifs.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Special issue: Gene Ontology for microbiologists Applying the Gene Ontology in microbial annotation
    Giglio, Michelle G.
    Collmer, Candace W.
    Lomax, Jane
    Ireland, Amelia
    TRENDS IN MICROBIOLOGY, 2009, 17 (07) : 262 - 268
  • [42] The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
    Camon, E
    Magrane, M
    Barrell, D
    Lee, V
    Dimmer, E
    Maslen, J
    Binns, D
    Harte, N
    Lopez, R
    Apweiler, R
    NUCLEIC ACIDS RESEARCH, 2004, 32 : D262 - D266
  • [43] Ontology-based automatic annotation of learning content
    Jovanovic, Jelena
    Gasevic, Dragan
    Devedzic, Vladan
    INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2006, 2 (02) : 91 - 119
  • [44] Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher
    Po, Laura
    Bergamaschi, Sonia
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, PT II, PROCEEDINGS, 2010, 5991 : 144 - 153
  • [45] FGGA-lnc: automatic gene ontology annotation of lncRNA sequences based on secondary structures
    Spetale, Flavio E.
    Murillo, Javier
    Villanova, Gabriela V.
    Bulacio, Pilar
    Tapia, Elizabeth
    INTERFACE FOCUS, 2021, 11 (04)
  • [46] Protein-protein interaction inference based on semantic similarity of Gene Ontology terms
    Zhang, Shu-Bo
    Tang, Qiang-Rong
    JOURNAL OF THEORETICAL BIOLOGY, 2016, 401 : 30 - 37
  • [47] Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity
    Sarker, Bishnu
    Khare, Navya
    Devignes, Marie-Dominique
    Aridhi, Sabeur
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2020), 2020, 12108 : 261 - 272
  • [48] HashGO: hashing gene ontology for protein function prediction
    Yu, Guoxian
    Zhao, Yingwen
    Lu, Chang
    Wang, Jun
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2017, 71 : 264 - 273
  • [49] PIGOK: Linking protein identity to gene ontology and function
    Jacob, Richard J.
    Cramer, Rainer
    JOURNAL OF PROTEOME RESEARCH, 2006, 5 (12) : 3429 - 3432
  • [50] Identifying Modular Function via Edge Annotation in Gene Correlation Networks using Gene Ontology Search
    Dempsey, Kathryn
    Thapa, Ishwor
    Bastola, Dhundy
    Ali, Hesham
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 255 - 261