A language modeling text mining approach to the annotation of protein community

被引:0
|
作者
Zhang, Xiaodan [1 ]
Wu, Daniel D. [1 ]
Zhou, Xiaohua [1 ]
Hu, Xiaohua [1 ]
机构
[1] Drexel Univ, Coll Informat Sci & Techno, 3141 Chestnut, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper discusses an ontology based language modeling text mining approach to the annotation of protein community. Communities appear to play an important role in the functional properties of complex networks. Being able to annotate the identified the community structure in a biological network can help us to understand better the structure and dynamics of biological systems. Traditional method such as Gene Ontology (GO) provides information about the functionality of gene products, but they are not enough to annotate community as for only limited number of proteins in the database, limited protein properties available for annotation and the inability to annotate a group of gene products as a whole. Thus, we present an ontology based mixture language model approach to annotate protein community. Compared to traditional method, we have the following three advantages. First, biomedical literature mining brings much richer information than existed gene databases. Second, the mixture language model can help "purify" the document by eliminating some background noise. Third, using domain ontology, we extract biological concept and concept pairs from abstracts. Biological concept is more meaningful than word or multi-word phrases. Moreover, using concept pairs can deliver much more information and serve as evidence of annotation results. We test our approach on four communities SAGA-SRB, CCR-NOT, RFC and ARP2/3, detected from dataset of interactions for Saccharomyces cerevisae from the General Repository for Interaction Datasets (GRID). Annotation results provide a very coherent indication of functionality of each community.
引用
收藏
页码:12 / +
页数:2
相关论文
共 50 条
  • [21] Term-specific language modeling approach to text categorization
    Kang, SS
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2004, PT 4, 2004, 3046 : 735 - 742
  • [22] A Text Segmentation Approach for Automated Annotation of Online Customer Reviews, Based on Topic Modeling
    Hananto, Valentinus Roby
    Serdult, Uwe
    Kryssanov, Victor
    APPLIED SCIENCES-BASEL, 2022, 12 (07):
  • [23] Best Practices for Text Annotation with Large Language Models
    Toernberg, Petter
    SOCIOLOGICA-INTERNATIONAL JOURNAL FOR SOCIOLOGICAL DEBATE, 2024, 18 (02): : 67 - 85
  • [24] Improving text mining with controlled natural language:: A case study for protein interactions
    Kuhn, Tobias
    Royer, Loic
    Fuchs, Norbert E.
    Schroeder, Michael
    DATA INTEGRATION IN THE LIFE SCIENCES, PROCEEDINGS, 2006, 4075 : 66 - 81
  • [25] Semantic Annotation of Aerospace Problem Reports to Support Text Mining
    Malin, Jane T.
    Millward, Christopher
    Gomez, Fernando
    Throop, David R.
    IEEE INTELLIGENT SYSTEMS, 2010, 25 (05) : 20 - 26
  • [26] Text Mining for Protein Docking
    Badal, Varsha D.
    Kundrotas, Petras J.
    Vakser, Ilya A.
    PLOS COMPUTATIONAL BIOLOGY, 2015, 11 (12)
  • [27] AN APPROACH ON MULTILEVEL TEXT MINING
    Onet, Adrian
    KEPT 2009: KNOWLEDGE ENGINEERING PRINCIPLES AND TECHNIQUES, 2009, : 85 - 92
  • [28] A Survey of Topic Modeling in Text Mining
    Alghamdi, Rubayyi
    Alfalqi, Khalid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (01) : 147 - 153
  • [29] The language of mathematics teaching: a text mining approach to explore the zeitgeist of US mathematics education
    Tracy E. Dobie
    Bruce Sherin
    Educational Studies in Mathematics, 2021, 107 : 159 - 188
  • [30] The language of mathematics teaching: a text mining approach to explore the zeitgeist of US mathematics education
    Dobie, Tracy E.
    Sherin, Bruce
    EDUCATIONAL STUDIES IN MATHEMATICS, 2021, 107 (01) : 159 - 188