A graphical model method for integrating multiple sources of genome-scale data

被引:5
|
作者
Dvorkin, Daniel [1 ]
Biehs, Brian [2 ,3 ]
Kechris, Katerina [1 ,4 ]
机构
[1] Univ Colorado, Sch Med, Computat Biosci Program, Aurora, CO 80045 USA
[2] Univ Calif San Francisco, Cardiovasc Res Inst, San Francisco, CA 94143 USA
[3] Univ Calif San Francisco, Dept Biochem & Biophys, San Francisco, CA 94143 USA
[4] Colorado Sch Publ Hlth, Dept Biostat & Informat, Aurora, CO 80045 USA
关键词
data integration; genomics; graphical models; mixture models; GENE-EXPRESSION DATA; MIXTURE-MODELS; DNA-BINDING; CHIP-CHIP; DISCOVERY; IDENTIFICATION; TRANSCRIPTION; TARGETS; SAMPLES; DORSAL;
D O I
10.1515/sagmb-2012-0051
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Making effective use of multiple data sources is a major challenge in modern bioinformatics. Genome-wide data such as measures of transcription factor binding, gene expression, and sequence conservation, which are used to identify binding regions and genes that are important to major biological processes such as development and disease, can be difficult to use together due to the different biological meanings and statistical distributions of the heterogeneous data types, but each can provide valuable information for understanding the processes under study. Here we present methods for integrating multiple data sources to gain a more complete picture of gene regulation and expression. Our goal is to identify genes and cis-regulatory regions which play specific biological roles. We describe a graphical mixture model approach for data integration, examine the effect of using different model topologies, and discuss methods for evaluating the effectiveness of the models. Model fitting is computationally efficient and produces results which have clear biological and statistical interpretations. The Hedgehog and Dorsal signaling pathways in Drosophila, which are critical in embryonic development, are used as examples.
引用
收藏
页码:469 / 487
页数:19
相关论文
共 50 条
  • [41] GOing Bayesian: model-based gene set analysis of genome-scale data
    Bauer, Sebastian
    Gagneur, Julien
    Robinson, Peter N.
    NUCLEIC ACIDS RESEARCH, 2010, 38 (11) : 3523 - 3532
  • [42] High throughput barcoding method for genome-scale phasing
    David Redin
    Tobias Frick
    Hooman Aghelpasand
    Max Käller
    Erik Borgström
    Remi-Andre Olsen
    Afshin Ahmadian
    Scientific Reports, 9
  • [43] Model-centric data integration and the development of genome-scale models of metabolism.
    Schilling, CH
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 224 : U479 - U479
  • [44] Meta-Analysis of Heterogeneous Data Sources for Genome-Scale Identification of Risk Genes in Complex Phenotypes
    Pers, Tune H.
    Hansen, Niclas Tue
    Lage, Kasper
    Koefoed, Pernille
    Dworzynski, Piotr
    Miller, Martin Lee
    Flint, Tracey J.
    Mellerup, Erling
    Dam, Henrik
    Andreassen, Ole A.
    Djurovic, Srdjan
    Melle, Ingrid
    Borglum, Anders D.
    Werge, Thomas
    Purcell, Shaun
    Ferreira, Manuel A.
    Kouskoumvekaki, Irene
    Workman, Christopher T.
    Hansen, Torben
    Mors, Ole
    Brunak, Soren
    GENETIC EPIDEMIOLOGY, 2011, 35 (05) : 318 - 332
  • [45] Data-driven hypothesis weighting increases detection power in genome-scale multiple testing
    Ignatiadis N.
    Klaus B.
    Zaugg J.B.
    Huber W.
    Nature Methods, 2016, 13 (7) : 577 - 580
  • [46] Data-driven hypothesis weighting increases detection power in genome-scale multiple testing
    Ignatiadis, Nikolaos
    Klaus, Bernd
    Zaugg, Judith B.
    Huber, Wolfgang
    NATURE METHODS, 2016, 13 (07) : 577 - +
  • [47] Towards a genome-scale metabolic model of Dunaliella salina
    Cunha, Emanuel
    Sousa, Vitor
    Vicente, Antonio
    Geada, Pedro
    Dias, Oscar
    IFAC PAPERSONLINE, 2024, 58 (23): : 37 - 42
  • [48] MEMOTE for standardized genome-scale metabolic model testing
    Lieven, Christian
    Beber, Moritz E.
    Olivier, Brett G.
    Bergmann, Frank T.
    Ataman, Meric
    Babaei, Parizad
    Bartell, Jennifer A.
    Blank, Lars M.
    Chauhan, Siddharth
    Correia, Kevin
    Diener, Christian
    Draeger, Andreas
    Ebert, Birgitta E.
    Edirisinghe, Janaka N.
    Faria, Jose P.
    Feist, Adam M.
    Fengos, Georgios
    Fleming, Ronan M. T.
    Garcia-Jimenez, Beatriz
    Hatzimanikatis, Vassily
    van Helvoirt, Wout
    Henry, Christopher S.
    Hermjakob, Henning
    Herrgard, Markus J.
    Kaafarani, Ali
    Kim, Hyun Uk
    King, Zachary
    Klamt, Steffen
    Klipp, Edda
    Koehorst, Jasper J.
    Koenig, Matthias
    Lakshmanan, Meiyappan
    Lee, Dong-Yup
    Lee, Sang Yup
    Lee, Sunjae
    Lewis, Nathan E.
    Liu, Filipe
    Ma, Hongwu
    Machado, Daniel
    Mahadevan, Radhakrishnan
    Maia, Paulo
    Mardinoglu, Adil
    Medlock, Gregory L.
    Monk, Jonathan M.
    Nielsen, Jens
    Nielsen, Lars Keld
    Nogales, Juan
    Nookaew, Intawat
    Palsson, Bernhard O.
    Papin, Jason A.
    NATURE BIOTECHNOLOGY, 2020, 38 (03) : 272 - 276
  • [49] Genome-scale metabolic model of Helicobacter pylori 26695
    Schilling, CH
    Covert, MW
    Famili, I
    Church, GM
    Edwards, JS
    Palsson, BO
    JOURNAL OF BACTERIOLOGY, 2002, 184 (16) : 4582 - 4593
  • [50] Methods for automated genome-scale metabolic model reconstruction
    Faria, Jose P.
    Rocha, Miguel
    Rocha, Isabel
    Henry, Christopher S.
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2018, 46 : 931 - 936