An Empirical Bayes approach for the identification of long-range chromosomal interaction from Hi-C data

被引:0
|
作者
Zhang, Qi [1 ]
Xu, Zheng [2 ]
Lai, Yutong [3 ]
机构
[1] Univ New Hampshire, Dept Math & Stat, Durham, NH 03824 USA
[2] Wright State Univ, Dept Math & Stat, Dayton, OH 45435 USA
[3] ClinChoice, Ft Washington, PA 19034 USA
关键词
empirical Bayes; epigenetics; Hi-C; peak identification; RHODOPSIN KINASE GENE; MODEL; NULL; ARCHITECTURE; GENOME; MAP;
D O I
10.1515/sagmb-2020-0026
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the "true" interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a Smoothed Expectation Maximization algorithm, and the empirical null based on the zero assumption. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can identify better peaks in terms of accuracy, biological interpretability, and the consistency across biological replicates. The source code is available on Github (https:// github.com/QiZhangStat/EBHiC).
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [31] Unsupervised Learning from Noisy Networks with Applications to Hi-C Data
    Wang, Bo
    Zhu, Junjie
    Ursu, Oana
    Pourshafeie, Armin
    Batzoglou, Serafim
    Kundaje, Anshul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [32] Advancing chromosomal-scale, haplotype-resolved genome assembly: beading with Hi-C data
    Kesen Zhu
    Qingyun Li
    Qianqian Kong
    Junpeng Shi
    Advanced Biotechnology, 2 (3):
  • [33] Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data
    Xingtan Zhang
    Shengcheng Zhang
    Qian Zhao
    Ray Ming
    Haibao Tang
    Nature Plants, 2019, 5 : 833 - 845
  • [34] Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data
    Zhang, Xingtan
    Zhang, Shengcheng
    Zhao, Qian
    Ming, Ray
    Tang, Haibao
    NATURE PLANTS, 2019, 5 (08) : 833 - 845
  • [35] Block Bootstrap for the Empirical Process of Long-Range Dependent Data
    Tewes, Johannes
    JOURNAL OF TIME SERIES ANALYSIS, 2018, 39 (01) : 28 - 53
  • [36] Superconductivity from a long-range repulsive interaction
    Onari, S.
    Arita, R.
    Kuroki, K.
    Aoki, H.
    LOW TEMPERATURE PHYSICS, PTS A AND B, 2006, 850 : 559 - +
  • [37] ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data
    Oluwadare, Oluwatosin
    Cheng, Jianlin
    BMC BIOINFORMATICS, 2017, 18
  • [38] Chromosome3D: reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing
    Badri Adhikari
    Tuan Trieu
    Jianlin Cheng
    BMC Genomics, 17
  • [39] Extracting multi-way chromatin contacts from Hi-C data
    Liu, Lei
    Zhang, Bokai
    Hyeon, Changbong
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (12)
  • [40] Binless normalization of Hi-C data provides significant interaction and difference detection independent of resolution
    Yannick G. Spill
    David Castillo
    Enrique Vidal
    Marc A. Marti-Renom
    Nature Communications, 10