A Statistical Framework for the Analysis of ChIP-Seq Data

被引:72
|
作者
Kuan, Pei Fen [1 ,2 ,3 ]
Chung, Dongjun [2 ,3 ]
Pan, Guangjin [4 ,5 ,7 ]
Thomson, James A. [6 ,7 ]
Stewart, Ron [7 ]
Keles, Suenduez [2 ,3 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[3] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[4] Chinese Acad Sci, Guangzhou Inst Biomed, Guangzhou 510530, Guangdong, Peoples R China
[5] Chinese Acad Sci, Guangzhou Inst Hlth, Guangzhou 510530, Guangdong, Peoples R China
[6] Univ Wisconsin, Dept Anat, Genome Ctr Wisconsin, Madison, WI 53715 USA
[7] Univ Wisconsin, Morgridge Inst Res, Madison, WI 53715 USA
关键词
GC content; Mappability; Mixture model; Negative binomial regression; Next generation sequencing; GENE-EXPRESSION; HIGH-RESOLUTION; BINDING; REGIONS; GENOME; SEQUENCE; REVEALS; MOTIF; DNA;
D O I
10.1198/jasa.2011.ap09706
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard preprocessing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences noncross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies. This article has supplementary material online.
引用
收藏
页码:891 / 903
页数:13
相关论文
共 50 条
  • [41] A decade of ChIP-seq
    Marinov, Georgi K.
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2018, 17 (02) : 77 - 79
  • [42] RACS: rapid analysis of ChIP-Seq data for contig based genomes
    Alejandro Saettone
    Marcelo Ponce
    Syed Nabeel-Shah
    Jeffrey Fillingham
    [J]. BMC Bioinformatics, 20
  • [43] Computer and Statistical Analysis of Transcription Factor Binding and Chromatin Modifications by ChIP-seq data in Embryonic Stem Cell
    Orlov, Yuriy
    Xu, Han
    Afonnikov, Dmitri
    Lim, Bing
    Heng, Jian-Chien
    Yuan, Ping
    Chen, Ming
    Yan, Junli
    Clarke, Neil
    Orlova, Nina
    Huss, Mikael
    Gunbin, Konstantin
    Podkolodnyy, Nikolay
    Ng, Huck-Hui
    [J]. JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2012, 9 (02):
  • [44] coMOTIF: a mixture framework for identifying transcription factor and a coregulator motif in ChIP-seq Data
    Xu, Mengyuan
    Weinberg, Clarice R.
    Umbach, David M.
    Li, Leping
    [J]. BIOINFORMATICS, 2011, 27 (19) : 2625 - 2632
  • [45] Differential principal component analysis of ChIP-seq
    Ji, Hongkai
    Li, Xia
    Wang, Qian-fei
    Ning, Yang
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (17) : 6789 - 6794
  • [46] Hierarchical hidden Markov model with application to joint analysis of ChIP-chip and ChIP-seq data
    Choi, Hyungwon
    Nesvizhskii, Alexey I.
    Ghosh, Debashis
    Qin, Zhaohui S.
    [J]. BIOINFORMATICS, 2009, 25 (14) : 1715 - 1721
  • [47] An integrated software system for analyzing ChIP-chip and ChIP-seq data
    Ji, Hongkai
    Jiang, Hui
    Ma, Wenxiu
    Johnson, David S.
    Myers, Richard M.
    Wong, Wing H.
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (11) : 1293 - 1300
  • [48] ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
    Lihua J Zhu
    Claude Gazin
    Nathan D Lawson
    Hervé Pagès
    Simon M Lin
    David S Lapointe
    Michael R Green
    [J]. BMC Bioinformatics, 11
  • [49] ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
    Zhu, Lihua J.
    Gazin, Claude
    Lawson, Nathan D.
    Pages, Herve
    Lin, Simon M.
    Lapointe, David S.
    Green, Michael R.
    [J]. BMC BIOINFORMATICS, 2010, 11
  • [50] An integrated software system for analyzing ChIP-chip and ChIP-seq data
    Hongkai Ji
    Hui Jiang
    Wenxiu Ma
    David S Johnson
    Richard M Myers
    Wing H Wong
    [J]. Nature Biotechnology, 2008, 26 : 1293 - 1300