Exploiting the Functional and Taxonomic Structure of Genomic Data by Probabilistic Topic Modeling

被引:13
|
作者
Chen, Xin [1 ]
Hu, Xiaohua [1 ]
Lim, Tze Y. [2 ]
Shen, Xiajiong [3 ]
Park, E. K. [4 ]
Rosen, Gail L. [5 ]
机构
[1] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
[2] Drexel Univ, Dept Phys, Philadelphia, PA 19104 USA
[3] Henan Univ, Coll Comp & Informat Engn, Kaifeng, Henan, Peoples R China
[4] Calif State Univ Chico, Chico, CA 95929 USA
[5] Drexel Univ, Dept Elect & Comp Engn, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Data mining; bioinformatics (genome or protein) databases; language models; metagenomics; CLASSIFICATION; MICROBIOTA;
D O I
10.1109/TCBB.2011.113
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we present a method that enable both homology-based approach and composition-based approach to further study the functional core (i.e., microbial core and gene core, correspondingly). In the proposed method, the identification of major functionality groups is achieved by generative topic modeling, which is able to extract useful information from unlabeled data. We first show that generative topic model can be used to model the taxon abundance information obtained by homology-based approach and study the microbial core. The model considers each sample as a "document," which has a mixture of functional groups, while each functional group (also known as a "latent topic") is a weight mixture of species. Therefore, estimating the generative topic model for taxon abundance data will uncover the distribution over latent functions (latent topic) in each sample. Second, we show that, generative topic model can also be used to study the genome-level composition of "N-mer" features (DNA subreads obtained by composition-based approaches). The model consider each genome as a mixture of latten genetic patterns (latent topics), while each functional pattern is a weighted mixture of the "N-mer" features, thus the existence of core genomes can be indicated by a set of common N-mer features. After studying the mutual information between latent topics and gene regions, we provide an explanation of the functional roles of uncovered latten genetic patterns. The experimental results demonstrate the effectiveness of proposed method.
引用
收藏
页码:980 / 991
页数:12
相关论文
共 50 条
  • [41] Exploiting the Wikipedia structure in local and global classification of taxonomic relations
    Do, Quang Xuan
    Roth, Dan
    [J]. NATURAL LANGUAGE ENGINEERING, 2012, 18 : 235 - 262
  • [42] Continuous-Trait Probabilistic Model for Comparing Multi-species Functional Genomic Data
    Yang, Yang
    Gu, Quanquan
    Zhang, Yang
    Sasaki, Takayo
    Crivello, Julianna
    O'Neill, Rachel J.
    Gilbert, David M.
    Ma, Jian
    [J]. CELL SYSTEMS, 2018, 7 (02) : 208 - +
  • [43] Probabilistic modeling for symbolic data
    Bock, Hans-Hermann
    [J]. COMPSTAT 2008: PROCEEDINGS IN COMPUTATIONAL STATISTICS, 2008, : 55 - 65
  • [44] Continuous-Trait Probabilistic Model for Comparing Multi-species Functional Genomic Data
    Yang, Yang
    Gu, Quanquan
    Sasaki, Takayo
    Crivello, Julianna
    O'Neill, Rachel
    Gilbert, David M.
    Ma, Jian
    [J]. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2018, 2018, 10812 : 293 - 294
  • [45] EXPLOITING FUNCTIONAL STRUCTURE AT THE GRAMMATICAL LEVEL
    CONNOLLY, JH
    [J]. CLINICAL LINGUISTICS & PHONETICS, 1990, 4 (01) : 1 - 8
  • [46] Exploiting In-memory Systems for Genomic Data Analysis
    Shah, Zeeshan Ali
    El-Kalioby, Mohamed
    Faquih, Tariq
    Shokrof, Moustafa
    Subhani, Shazia
    Alnakhli, Yasser
    Aljafar, Hussain
    Anjum, Ashiq
    Abouelhoda, Mohamed
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2018, PT I, 2018, 10813 : 405 - 414
  • [47] A Tutorial on Probabilistic Topic Models for Text Data Retrieval and Analysis
    Zhai, ChengXiang
    Geigle, Chase
    [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 1395 - 1397
  • [48] Discovering the Thematic Structure of the Quran using Probabilistic Topic Model
    Siddiqui, Muazzam Ahmed
    Faraz, Syed Muhammad
    Sattar, Sohail Abdul
    [J]. 2013 TAIBAH UNIVERSITY INTERNATIONAL CONFERENCE ON ADVANCES IN INFORMATION TECHNOLOGY FOR THE HOLY QURAN AND ITS SCIENCES, 2013, : 234 - 239
  • [49] Estimating Functional Groups in Human Gut Microbiome With Probabilistic Topic Models
    Chen, Xin
    He, TingTing
    Hu, Xiaohua
    Zhou, Yanhong
    An, Yuan
    Wu, Xindong
    [J]. IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2012, 11 (03) : 203 - 215
  • [50] Overlapping Coalition Formation via Probabilistic Topic Modeling Extended Abstract
    Mamakos, Michalis
    Chalkiadakis, Georgios
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), 2018, : 2010 - 2012