Combining sequence and time series expression data to learn transcriptional modules

被引:15
|
作者
Kundaje, A [1 ]
Middendorf, M
Gao, F
Wiggins, C
Leslie, C
机构
[1] Columbia Univ, Dept Comp Sci, New York, NY 10027 USA
[2] Columbia Univ, Dept Phys, New York, NY 10027 USA
[3] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA
[4] Columbia Univ, Dept Appl Math, New York, NY 10027 USA
关键词
gene regulation; clustering; heterogeneous data;
D O I
10.1109/TCBB.2005.34
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Our goal is to cluster genes into transcriptional modules-sets of genes where similarity in expression is explained by common regulatory mechanisms at the transcriptional level. We want to learn modules from both time series gene expression data and genome-wide motif data that are now readily available for organisms such as S. cereviseae as a result of prior computational studies or experimental results. We present a generative probabilistic model for combining'regulatory sequence and time series expression data to cluster genes into coherent transcriptional modules. Starting with a set of motifs representing known or putative regulatory elements (transcription factor binding sites) and the counts of occurrences of these motifs in each gene's promoter region, together with a time series expression profile for each gene, the learning algorithm uses expectation maximization to learn module assignments based on both types of data. We also present a technique based on the Jensen-Shannon entropy contributions of motifs in the learned model for associating the most significant motifs to each module. Thus, the algorithm gives a global approach for associating sets of regulatory elements to "modules" of genes with similar time series expression profiles. The model for expression data exploits our prior belief of smooth dependence on time by using statistical splines and is suitable for typical time course data sets with relatively few experiments. Moreover, the model is sufficiently interpretable that we can understand how both sequence data and expression data contribute to the cluster assignments, and how to interpolate between the two data sources. We present experimental results on the yeast cell cycle to validate our method and find that our combined expression and motif clustering algorithm discovers modules with both coherent expression and similar motif patterns, including binding motifs associated to known cell cycle transcription factors.
引用
收藏
页码:194 / 202
页数:9
相关论文
共 50 条
  • [1] Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm
    Madeira, Sara C.
    Teixeira, Miguel C.
    Sa-Correia, Isabel
    Oliveira, Arlindo L.
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2010, 7 (01) : 153 - 165
  • [2] BENIN: combining knockout data with time series gene expression data for the gene regulatory network inference
    Kamgnia, Stephanie
    Butler, Gregory
    [J]. PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO 2019), 2019,
  • [3] Time Series Data Prediction Based on Sequence to Sequence Model
    Yang, Chao
    Guo, Zhongwen
    Xian, Lintao
    [J]. 2019 5TH INTERNATIONAL CONFERENCE ON MECHANICAL ENGINEERING AND AUTOMATION SCIENCE (ICMEAS 2019), 2019, 692
  • [4] Reverse-Engineering Transcriptional Modules from Gene Expression Data
    Michoel, Tom
    De Smet, Riet
    Joshi, Anagha
    Marchal, Kathleen
    Van de Peer, Yves
    [J]. CHALLENGES OF SYSTEMS BIOLOGY: COMMUNITY EFFORTS TO HARNESS BIOLOGICAL COMPLEXITY, 2009, 1158 : 36 - 43
  • [5] Combining Functional Data Projections for Time Series Classification
    Munoz, Alberto
    Gonzalez, Javier
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, PROCEEDINGS, 2009, 5856 : 457 - 464
  • [6] Genome-wide discovery of transcriptional modules from DNA sequence and gene expression
    Segal, E.
    Yelensky, R.
    Koller, D.
    [J]. BIOINFORMATICS, 2003, 19 : i273 - i282
  • [7] Identifying cycling genes by combining sequence homology and expression data
    Lu, Yong
    Rosenfeld, Roni
    Bar-Joseph, Ziv
    [J]. BIOINFORMATICS, 2006, 22 (14) : E314 - E322
  • [8] Application of Transcriptional Gene Modules to Analysis of Caenorhabditis elegans' Gene Expression Data
    Cary, Michael
    Podshivalova, Katie
    Kenyon, Cynthia
    [J]. G3-GENES GENOMES GENETICS, 2020, 10 (10): : 3623 - 3638
  • [9] Combining Convolution and Transformer for Missing Time Series Data Imputation
    Wang, Yi-Fan
    Bu, Shuai-Yu
    Yan, Jing-Hua
    Hou, Zhi-Wen
    Bu, Ling-Bin
    Meng, Fan-Xu
    [J]. Journal of Network Intelligence, 2023, 8 (03): : 823 - 838
  • [10] Identification of K-Tolerance Regulatory Modules in Time Series Gene Expression Data Using a Biclustering Algorithm
    Phukhachee, Tustanah
    Maneewongvatana, Songrit
    [J]. ACTIVE MEDIA TECHNOLOGY, AMT 2013, 2013, 8210 : 146 - 155