Effective normalization for copy number variation detection from whole genome sequencing

被引:17
|
作者
Janevski, Angel [1 ]
Varadan, Vinay [1 ]
Kamalakaran, Sitharthan [1 ]
Banerjee, Nilanjana [1 ]
Dimitrova, Nevenka [1 ]
机构
[1] Philips Res, Briarcliff Manor, NY 10510 USA
来源
BMC GENOMICS | 2012年 / 13卷
关键词
Copy Number Variation; Jaccard Index; Control Genome; Copy Number Variation Region; Copy Number Profile;
D O I
10.1186/1471-2164-13-S6-S16
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. Methods: We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. Results: The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. Conclusions: Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Effective normalization for copy number variation detection from whole genome sequencing
    Angel Janevski
    Vinay Varadan
    Sitharthan Kamalakaran
    Nilanjana Banerjee
    Nevenka Dimitrova
    [J]. BMC Genomics, 13
  • [2] Exome sequencing and whole genome sequencing for the detection of copy number variation
    Hehir-Kwa, Jayne Y.
    Pfundt, Rolph
    Veltman, Joris A.
    [J]. EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, 2015, 15 (08) : 1023 - 1032
  • [3] CODEX: a normalization and copy number variation detection method for whole exome sequencing
    Jiang, Yuchao
    Oldridge, Derek A.
    Diskin, Sharon J.
    Zhang, Nancy R.
    [J]. NUCLEIC ACIDS RESEARCH, 2015, 43 (06) : e39
  • [4] CODEX: a normalization and copy number variation detection method for whole-exome sequencing
    Jiang, Yuchao
    Oldridge, Derek A.
    Diskin, Sharon J.
    Zhang, Nancy R.
    [J]. CANCER RESEARCH, 2015, 75
  • [5] CODEX: A Normalization and Copy Number Variation Detection Method for Whole-Exome Sequencing
    Jiang, Yuchao
    Oldridge, Derek A.
    Diskin, Sharon J.
    Zhang, Nancy R.
    [J]. HUMAN HEREDITY, 2016, 81 (02) : 54 - 55
  • [6] Clinical Validation of Whole-Genome Sequencing for the Detection of Copy Number Variation
    Thayanithy, V.
    Thyagarajan, B.
    Bower, M.
    Munro, S.
    Lam, H.
    Bray, S.
    Vivek, S.
    Schomaker, M.
    Daniel, J.
    Henzler, C.
    Nelson, A.
    Yohe, S.
    McIntyre, K.
    [J]. JOURNAL OF MOLECULAR DIAGNOSTICS, 2022, 24 (10): : S27 - S27
  • [7] Copy number variation detection in Chinese indigenous cattle by whole genome sequencing
    Mei, Chugang
    Junjvlieke, Zainaguli
    Raza, Sayed Haidar Abbas
    Wang, Hongbao
    Cheng, Gong
    Zhao, Chuping
    Zhu, Wenjuan
    Zan, Linsen
    [J]. GENOMICS, 2020, 112 (01) : 831 - 836
  • [8] Detecting Copy Number Variation from Whole-Genome Sequencing Data
    Jobanputra, V.
    Klein, R.
    Nahum, O.
    Yang, S.
    Ballinger, D.
    Beilharz, E.
    Levy, B.
    [J]. CYTOGENETIC AND GENOME RESEARCH, 2014, 142 (03)
  • [9] Whole genome sequencing for copy number variation and structural variant analyses
    Chatron, Nicolas
    Bernard, Virginie
    Richard, Celine
    Salaun, Gaelle
    Gouas, Laetitia
    Michel-Calemard, Laurence
    Charret, Quentin
    Klein, Valentin
    Fancello, Laura
    Nicolas, Laury
    Viari, Alain
    Ferrari, Anthony
    Blay, Jean Yves
    Till, Marianne
    Touraine, Renaud
    Ramond, Francis
    Harzallah, Ines
    Harbuz, Radu
    Satre, Veronique
    Schluth-Bolard, Caroline
    Vago, Philippe
    Coutton, Charles
    Vinciguerra, Christine
    Sanlaville, Damien
    Thevenon, Julien
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 619 - 620
  • [10] A Comparison of Tools for Copy-Number Variation Detection in Germline Whole Exome and Whole Genome Sequencing Data
    Gabrielaite, Migle
    Torp, Mathias Husted
    Rasmussen, Malthe Sebro
    Andreu-Sanchez, Sergio
    Vieira, Filipe Garrett
    Pedersen, Christina Bligaard
    Kinalis, Savvas
    Madsen, Majbritt Busk
    Kodama, Miyako
    Demircan, Guel Sude
    Simonyan, Arman
    Yde, Christina Westmose
    Olsen, Lars Ronn
    Marvig, Rasmus L.
    ostrup, Olga
    Rossing, Maria
    Nielsen, Finn Cilius
    Winther, Ole
    Bagger, Frederik Otzen
    [J]. CANCERS, 2021, 13 (24)