A language modeling text mining approach to the annotation of protein community

被引:0
|
作者
Zhang, Xiaodan [1 ]
Wu, Daniel D. [1 ]
Zhou, Xiaohua [1 ]
Hu, Xiaohua [1 ]
机构
[1] Drexel Univ, Coll Informat Sci & Techno, 3141 Chestnut, Philadelphia, PA 19104 USA
关键词
D O I
暂无
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
This paper discusses an ontology based language modeling text mining approach to the annotation of protein community. Communities appear to play an important role in the functional properties of complex networks. Being able to annotate the identified the community structure in a biological network can help us to understand better the structure and dynamics of biological systems. Traditional method such as Gene Ontology (GO) provides information about the functionality of gene products, but they are not enough to annotate community as for only limited number of proteins in the database, limited protein properties available for annotation and the inability to annotate a group of gene products as a whole. Thus, we present an ontology based mixture language model approach to annotate protein community. Compared to traditional method, we have the following three advantages. First, biomedical literature mining brings much richer information than existed gene databases. Second, the mixture language model can help "purify" the document by eliminating some background noise. Third, using domain ontology, we extract biological concept and concept pairs from abstracts. Biological concept is more meaningful than word or multi-word phrases. Moreover, using concept pairs can deliver much more information and serve as evidence of annotation results. We test our approach on four communities SAGA-SRB, CCR-NOT, RFC and ARP2/3, detected from dataset of interactions for Saccharomyces cerevisae from the General Repository for Interaction Datasets (GRID). Annotation results provide a very coherent indication of functionality of each community.
引用
收藏
页码:12 / +
页数:2
相关论文
共 50 条
  • [41] Text Mining with R: A Tidy Approach
    Yan, Jianwei
    NATURAL LANGUAGE ENGINEERING, 2022, 28 (01) : 137 - 139
  • [42] Modeling the public attitude towards organic foods: a big data and text mining approach
    Anupam Singh
    Aldona Glińska-Neweś
    Journal of Big Data, 9
  • [43] Modeling the public attitude towards organic foods: a big data and text mining approach
    Singh, Anupam
    Glinska-Newes, Aldona
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [44] Canine Bacterial Endocarditis: A Text Mining and Topics Modeling Analysis as an Approach for a Systematic Review
    Previti, Annalisa
    Biondi, Vito
    Passantino, Annamaria
    Or, Mehmet Erman
    Pugliese, Michela
    MICROORGANISMS, 2024, 12 (06)
  • [45] TEXT CONDITIONING AND STATISTICAL LANGUAGE MODELING FOR ROMANIAN LANGUAGE
    Domokos, Jozsef
    Toderean, Gavril
    Buza, Ovidiu
    FROM SPEECH PROCESSING TO SPOKEN LANGUAGE TECHNOLOGY, 2009, : 161 - 168
  • [46] The articles.ELM resource: simplifying access to protein linear motif literature by annotation, text-mining and classification
    Palopoli, N.
    Iserte, J. A.
    Chemes, L. B.
    Marino-Buslje, C.
    Parisi, G.
    Gibson, T. J.
    Davey, N. E.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2020,
  • [47] Analysis of protein/protein interactions through biomedical literature: Text mining of abstracts vs. text mining of full text articles
    Martin, EPG
    Bremer, EG
    Guerin, MC
    DeSesa, C
    Jouve, O
    KNOWLEDGE EXPLORATION IN LIFE SCIENCE INFORMATICS, PROCEEDINGS, 2004, 3303 : 96 - 108
  • [48] Protein Sequence Annotation by means of Community Detection
    Profiti, Giuseppe
    Piovesan, Damiano
    Martelli, Pier Luigi
    Fariselli, Piero
    Casadio, Rita
    PROCEEDINGS IWBBIO 2013: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, 2013, : 753 - 755
  • [49] Protein Sequence Annotation by Means of Community Detection
    Profiti, Giuseppe
    Piovesan, Damiano
    Martelli, Pier Luigi
    Fariselli, Piero
    Casadio, Rita
    CURRENT BIOINFORMATICS, 2015, 10 (02) : 139 - 143
  • [50] Conceptual Modeling for Financial Investment with Text Mining
    Gu, Yang
    Storey, Veda C.
    Woo, Carson C.
    CONCEPTUAL MODELING, ER 2015, 2015, 9381 : 528 - 535