A Statistical Framework for the Analysis of ChIP-Seq Data

被引:72
|
作者
Kuan, Pei Fen [1 ,2 ,3 ]
Chung, Dongjun [2 ,3 ]
Pan, Guangjin [4 ,5 ,7 ]
Thomson, James A. [6 ,7 ]
Stewart, Ron [7 ]
Keles, Suenduez [2 ,3 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC 27599 USA
[2] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[3] Univ Wisconsin, Dept Biostat & Med Informat, Madison, WI 53706 USA
[4] Chinese Acad Sci, Guangzhou Inst Biomed, Guangzhou 510530, Guangdong, Peoples R China
[5] Chinese Acad Sci, Guangzhou Inst Hlth, Guangzhou 510530, Guangdong, Peoples R China
[6] Univ Wisconsin, Dept Anat, Genome Ctr Wisconsin, Madison, WI 53715 USA
[7] Univ Wisconsin, Morgridge Inst Res, Madison, WI 53715 USA
关键词
GC content; Mappability; Mixture model; Negative binomial regression; Next generation sequencing; GENE-EXPRESSION; HIGH-RESOLUTION; BINDING; REGIONS; GENOME; SEQUENCE; REVEALS; MOTIF; DNA;
D O I
10.1198/jasa.2011.ap09706
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard preprocessing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences noncross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies. This article has supplementary material online.
引用
收藏
页码:891 / 903
页数:13
相关论文
共 50 条
  • [31] ChIPseqR: analysis of ChIP-seq experiments
    Peter Humburg
    Chris A Helliwell
    David Bulger
    Glenn Stone
    [J]. BMC Bioinformatics, 12
  • [32] CistromeFinder for ChIP-seq and DNase-seq data reuse
    Sun, Hanfei
    Qin, Bo
    Liu, Tao
    Wang, Qixuan
    Liu, Jing
    Wang, Juan
    Lin, Xueqiu
    Yang, Yulin
    Taing, Len
    Rao, Prakash K.
    Brown, Myles
    Zhang, Yong
    Long, Henry W.
    Liu, X. Shirley
    [J]. BIOINFORMATICS, 2013, 29 (10) : 1352 - 1354
  • [33] Large-Scale Quality Analysis of Published ChIP-seq Data
    Marinov, Georgi K.
    Kundaje, Anshul
    Park, Peter J.
    Wold, Barbara J.
    [J]. G3-GENES GENOMES GENETICS, 2014, 4 (02): : 209 - 223
  • [34] A decade of ChIP-seq
    Marinov, Georgi K.
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2018, 17 (02) : 77 - 79
  • [35] Saturation analysis of ChIP-seq data for reproducible identification of binding peaks
    Hansen, Peter
    Hecht, Jochen
    Ibrahim, Daniel M.
    Krannich, Alexander
    Truss, Matthias
    Robinson, Peter N.
    [J]. GENOME RESEARCH, 2015, 25 (09) : 1391 - 1400
  • [36] Analysis of Gene Regulatory Networks Inferred from ChIP-seq Data
    Stamoulakatou, Eirini
    Piccardi, Carlo
    Masseroli, Marco
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2019, PT I, 2019, 11465 : 319 - 331
  • [37] A fully Bayesian hidden Ising model for ChIP-seq data analysis
    Mo, Qianxing
    [J]. BIOSTATISTICS, 2012, 13 (01) : 113 - 128
  • [38] An automated analysis pipeline for a large set of ChIP-seq data: AutoChIP
    Taemook Kim
    Wooseok Lee
    Kyudong Han
    Keunsoo Kang
    [J]. Genes & Genomics, 2015, 37 : 305 - 311
  • [39] An automated analysis pipeline for a large set of ChIP-seq data: AutoChIP
    Kim, Taemook
    Lee, Wooseok
    Han, Kyudong
    Kang, Keunsoo
    [J]. GENES & GENOMICS, 2015, 37 (03) : 305 - 311
  • [40] No more mixed signals: Improved ChIP-seq data analysis with greenscreen
    Artur, Mariana A. S.
    [J]. PLANT CELL, 2022, 34 (12): : 4673 - 4674