Effective normalization for copy number variation in Hi-C data

被引:19
|
作者
Servant, Nicolas [1 ,2 ,3 ]
Varoquaux, Nelle [4 ,5 ]
Heard, Edith [6 ]
Barillot, Emmanuel [1 ,2 ,3 ]
Vert, Jean-Philippe [1 ,2 ,3 ,7 ]
机构
[1] PSL Res Univ, Inst Curie, F-75005 Paris, France
[2] INSERM, U900, F-75005 Paris, France
[3] PSL Res Univ, CBIO Ctr Computat Biol, Mines ParisTech, F-75006 Paris, France
[4] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[5] Berkeley Inst Data Sci, Berkeley, CA USA
[6] PSL Res Univ, INSERM U934, CNRS UMR3215, Inst Curie, F-75005 Paris, France
[7] PSL Res Univ, Ecole Normale Super, Dept Math & Applicat, F-75005 Paris, France
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Normalization; Hi-C; Cancer; Copy-number; TOPOLOGICAL DOMAINS; HUMAN GENOME; 3D GENOME; CANCER; ORGANIZATION; ARCHITECTURE; DISRUPTION; ACTIVATION; LANDSCAPE; ELEMENTS;
D O I
10.1186/s12859-018-2256-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other. Results: In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes. Conclusions: Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Effective normalization for copy number variation in Hi-C data
    Nicolas Servant
    Nelle Varoquaux
    Edith Heard
    Emmanuel Barillot
    Jean-Philippe Vert
    [J]. BMC Bioinformatics, 19
  • [2] Effectiveness of machine learning at modeling the relationship between Hi-C data and copy number variation
    Wang, Yuyang
    Sun, Yu
    Liu, Zeyu
    Chen, Bijia
    Chen, Hebing
    Ren, Chao
    Lin, Xuanwei
    Hu, Pengzhen
    Jia, Peiheng
    Xu, Xiang
    Xu, Kang
    Liu, Ximeng
    Li, Hao
    Bo, Xiaochen
    [J]. QUANTITATIVE BIOLOGY, 2024, 12 (03) : 231 - 244
  • [3] Effectiveness of machine learning at modeling the relationship between Hi-C data and copy number variation
    Yuyang Wang
    Yu Sun
    Zeyu Liu
    Bijia Chen
    Hebing Chen
    Chao Ren
    Xuanwei Lin
    Pengzhen Hu
    Peiheng Jia
    Xiang Xu
    Kang Xu
    Ximeng Liu
    Hao Li
    Xiaochen Bo
    [J]. Quantitative Biology., 2024, 12 (03) - 244
  • [4] A computational strategy to adjust for copy number in tumor Hi-C data
    Wu, Hua-Jun
    Michor, Franziska
    [J]. BIOINFORMATICS, 2016, 32 (24) : 3695 - 3701
  • [5] Comparison of normalization methods for Hi-C data
    Lyu, Hongqiang
    Liu, Erhu
    Wu, Zhifang
    [J]. BIOTECHNIQUES, 2020, 68 (02) : 56 - 64
  • [6] covNorm: An R package for coverage based normalization of Hi-C and capture Hi-C data
    Kim, Kyukwang
    Jung, Inkyung
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 3149 - 3159
  • [7] Identification of copy number variations and translocations in cancer cells from Hi-C data
    Chakraborty, Abhijit
    Ay, Ferhat
    [J]. BIOINFORMATICS, 2018, 34 (02) : 338 - 345
  • [8] HiNT: a computational method for detecting copy number variations and translocations from Hi-C data
    Wang, Su
    Lee, Soohyun
    Chu, Chong
    Jain, Dhawal
    Kerpedjiev, Peter
    Nelson, Geoffrey M.
    Walsh, Jennifer M.
    Alver, Burak H.
    Park, Peter J.
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [9] HiNT: a computational method for detecting copy number variations and translocations from Hi-C data
    Su Wang
    Soohyun Lee
    Chong Chu
    Dhawal Jain
    Peter Kerpedjiev
    Geoffrey M. Nelson
    Jennifer M. Walsh
    Burak H. Alver
    Peter J. Park
    [J]. Genome Biology, 21
  • [10] Hi-C as a tool for precise detection and characterisation of chromosomal rearrangements and copy number variation in human tumours
    Louise Harewood
    Kamal Kishore
    Matthew D. Eldridge
    Steven Wingett
    Danita Pearson
    Stefan Schoenfelder
    V. Peter Collins
    Peter Fraser
    [J]. Genome Biology, 18