A Statistical Framework for the Analysis of ChIP-Seq Data

被引:72
|
作者
Kuan, Pei Fen [1 ,2 ,3 ]
Chung, Dongjun [2 ,3 ]
Pan, Guangjin [4 ,5 ,7 ]
Thomson, James A. [6 ,7 ]
Stewart, Ron [7 ]
Keles, Suenduez [2 ,3 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[3] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[4] Chinese Acad Sci, Guangzhou Inst Biomed, Guangzhou 510530, Guangdong, Peoples R China
[5] Chinese Acad Sci, Guangzhou Inst Hlth, Guangzhou 510530, Guangdong, Peoples R China
[6] Univ Wisconsin, Dept Anat, Genome Ctr Wisconsin, Madison, WI 53715 USA
[7] Univ Wisconsin, Morgridge Inst Res, Madison, WI 53715 USA
关键词
GC content; Mappability; Mixture model; Negative binomial regression; Next generation sequencing; GENE-EXPRESSION; HIGH-RESOLUTION; BINDING; REGIONS; GENOME; SEQUENCE; REVEALS; MOTIF; DNA;
D O I
10.1198/jasa.2011.ap09706
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard preprocessing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences noncross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies. This article has supplementary material online.
引用
收藏
页码:891 / 903
页数:13
相关论文
共 50 条
  • [1] Statistical Issues in the Analysis of ChIP-Seq and RNA-Seq Data
    Ghosh, Debashis
    Qin, Zhaohui S.
    [J]. GENES, 2010, 1 (02) : 317 - 334
  • [2] THE ANALYSIS OF CHIP-SEQ DATA
    Ma, Wenxiu
    Wong, Wing Hung
    [J]. METHODS IN ENZYMOLOGY, VOL 497: SYNTHETIC BIOLOGY, METHODS FOR PART/DEVICE CHARACTERIZATION AND CHASSIS ENGINEERING, PT A, 2011, 497 : 51 - 73
  • [3] Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data
    Yanchun Bao
    Veronica Vinciotti
    Ernst Wit
    Peter AC ’t Hoen
    [J]. BMC Bioinformatics, 14
  • [4] Accounting for immunoprecipitation efficiencies in the statistical analysis of ChIP-seq data
    Bao, Yanchun
    Vinciotti, Veronica
    Wit, Ernst
    't Hoen, Peter A. C.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [5] A statistical framework for power calculations in ChIP-seq experiments
    Zuo, Chandler
    Keles, Suenduez
    [J]. BIOINFORMATICS, 2014, 30 (06) : 753 - 760
  • [6] BayesPeak: Bayesian analysis of ChIP-seq data
    Spyrou, Christiana
    Stark, Rory
    Lynch, Andy G.
    Tavare, Simon
    [J]. BMC BIOINFORMATICS, 2009, 10 : 299
  • [7] BayesPeak: Bayesian analysis of ChIP-seq data
    Christiana Spyrou
    Rory Stark
    Andy G Lynch
    Simon Tavaré
    [J]. BMC Bioinformatics, 10
  • [9] Principles of ChIP-seq Data Analysis Illustrated with Examples
    Ambrosini, Giovanna
    Dreos, Rene
    Bucher, Philipp
    [J]. PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 682 - 694
  • [10] Practical Guidelines for the Comprehensive Analysis of ChIP-seq Data
    Bailey, Timothy
    Krajewski, Pawel
    Ladunga, Istvan
    Lefebvre, Celine
    Li, Qunhua
    Liu, Tao
    Madrigal, Pedro
    Taslim, Cenny
    Zhang, Jie
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (11)