A language modeling text mining approach to the annotation of protein community

被引:0
|
作者
Zhang, Xiaodan [1 ]
Wu, Daniel D. [1 ]
Zhou, Xiaohua [1 ]
Hu, Xiaohua [1 ]
机构
[1] Drexel Univ, Coll Informat Sci & Techno, 3141 Chestnut, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper discusses an ontology based language modeling text mining approach to the annotation of protein community. Communities appear to play an important role in the functional properties of complex networks. Being able to annotate the identified the community structure in a biological network can help us to understand better the structure and dynamics of biological systems. Traditional method such as Gene Ontology (GO) provides information about the functionality of gene products, but they are not enough to annotate community as for only limited number of proteins in the database, limited protein properties available for annotation and the inability to annotate a group of gene products as a whole. Thus, we present an ontology based mixture language model approach to annotate protein community. Compared to traditional method, we have the following three advantages. First, biomedical literature mining brings much richer information than existed gene databases. Second, the mixture language model can help "purify" the document by eliminating some background noise. Third, using domain ontology, we extract biological concept and concept pairs from abstracts. Biological concept is more meaningful than word or multi-word phrases. Moreover, using concept pairs can deliver much more information and serve as evidence of annotation results. We test our approach on four communities SAGA-SRB, CCR-NOT, RFC and ARP2/3, detected from dataset of interactions for Saccharomyces cerevisae from the General Repository for Interaction Datasets (GRID). Annotation results provide a very coherent indication of functionality of each community.
引用
收藏
页码:12 / +
页数:2
相关论文
共 50 条
  • [1] Natural language processing in text mining for structural modeling of protein complexes
    Varsha D. Badal
    Petras J. Kundrotas
    Ilya A. Vakser
    BMC Bioinformatics, 19
  • [2] Natural language processing in text mining for structural modeling of protein complexes
    Badal, Varsha D.
    Kundrotas, Petras J.
    Vakser, Ilya A.
    BMC BIOINFORMATICS, 2018, 19
  • [3] Evolution of Protein Functional Annotation: Text Mining Study
    Ilgisonis, Ekaterina, V
    Pogodin, Pavel, V
    Kiseleva, Olga, I
    Tarbeeva, Svetlana N.
    Ponomarenko, Elena A.
    JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (03):
  • [4] A text mining approach to the use of "groove" in everyday language
    Stupacher, Jan
    Bechtold, Toni
    Senn, Olivier
    PSYCHOLOGY OF MUSIC, 2024, 52 (03) : 340 - 361
  • [5] Fuzzy topic modeling approach for text mining over short text
    Rashid, Junaid
    Shah, Syed Muhammad Adnan
    Irtaza, Aun
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [6] A text mining approach to detect mentions of protein glycosylation in biomedical text
    Shukla, Daksha
    Jayaraman, Valadi K.
    BIOINFORMATION, 2012, 8 (16) : 758 - 762
  • [7] Text-mining assisted regulatory annotation
    Aerts, Stein
    Haeussler, Maximilian
    van Vooren, Steven
    Griffith, Obi L.
    Hulpiau, Paco
    Jones, Steven J. M.
    Montgomery, Stephen B.
    Bergman, Casey M.
    GENOME BIOLOGY, 2008, 9 (02)
  • [8] Text-mining assisted regulatory annotation
    Stein Aerts
    Maximilian Haeussler
    Steven van Vooren
    Obi L Griffith
    Paco Hulpiau
    Steven JM Jones
    Stephen B Montgomery
    Casey M Bergman
    Genome Biology, 9
  • [9] Online Brand Community User Segments: A Text Mining Approach
    Ge, Ruichen
    Zhao, Hong
    Zhang, Sha
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2022, 5
  • [10] A language modeling approach to search distributed text databases
    Yang, H
    Zhang, MJ
    AI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2903 : 196 - 207