CHARR efficiently estimates contamination from DNA sequencing data

被引:3
|
作者
Lu, Wenhan [1 ,2 ,3 ]
Gauthier, Laura D. [1 ,4 ]
Poterba, Timothy [1 ,2 ,3 ]
Giacopuzzi, Edoardo [5 ]
Goodrich, Julia K. [1 ,2 ]
Stevens, Christine R. [1 ,2 ,3 ]
King, Daniel [1 ,2 ,3 ]
Daly, Mark J. [1 ,2 ,3 ,6 ]
Neale, Benjamin M. [1 ,2 ,3 ,7 ]
Karczewski, Konrad J. [1 ,2 ,7 ]
机构
[1] Broad Inst MIT & Harvard, Program Med & Populat Genet, Cambridge, MA 02142 USA
[2] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[3] Broad Inst MIT & Harvard, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
[4] Broad Inst MIT & Harvard, Data Sci Platform, Cambridge, MA 02142 USA
[5] Human Technopole, Viale Rita Levi Montalcini 1, I-20157 Milan, Italy
[6] Inst Mol Med Finland, Helsinki, Finland
[7] Broad Inst MIT & Harvard, Novo Nordisk Fdn Ctr Genom Mech Dis, Cambridge, MA 02142 USA
关键词
SAMPLES;
D O I
10.1016/j.ajhg.2023.10.011
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
DNA sample contamination is a major issue in clinical and research applications of whole-genome and-exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and-exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.
引用
收藏
页码:2068 / 2076
页数:9
相关论文
共 50 条
  • [1] Detection of DNA Contamination in Prenatal Samples from Whole Exome Sequencing Data
    Smeekens, Sanne P.
    Timmermans, Raoul
    Westra, Dineke
    Gilissen, Christian
    Faas, Brigitte H. W.
    CLINICAL CHEMISTRY, 2024, 70 (08) : 1056 - 1063
  • [2] Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
    Avi Srivastava
    Laraib Malik
    Tom Smith
    Ian Sudbery
    Rob Patro
    Genome Biology, 20
  • [3] Alevin efficiently estimates accurate gene abundances from dscRNA-seq data
    Srivastava, Avi
    Malik, Laraib
    Smith, Tom
    Sudbery, Ian
    Patro, Rob
    GENOME BIOLOGY, 2019, 20 (1)
  • [4] Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data
    Jun, Goo
    Flickinger, Matthew
    Hetrick, Kurt N.
    Romm, Jane M.
    Doheny, Kimberly F.
    Abecasis, Goncalo R.
    Boehnke, Michael
    Kang, Hyun Min
    AMERICAN JOURNAL OF HUMAN GENETICS, 2012, 91 (05) : 839 - 848
  • [5] A metagenomic DNA sequencing assay that is robust against environmental DNA contamination
    Mzava, Omary
    Cheng, Alexandre Pellan
    Chang, Adrienne
    Smalling, Sami
    Kounatse, Liz-Audrey Djomnang
    Lenz, Joan Sesing
    Longman, Randy
    Steadman, Amy
    Gomez-Escobar, Luis G.
    Schenck, Edward J.
    Salvatore, Mirella
    Satlin, Michael J.
    Suthanthiran, Manikkam
    Lee, John R.
    Mason, Christopher E.
    Dadhania, Darshana
    De Vlaminck, Iwijn
    NATURE COMMUNICATIONS, 2022, 13 (01)
  • [6] A metagenomic DNA sequencing assay that is robust against environmental DNA contamination
    Omary Mzava
    Alexandre Pellan Cheng
    Adrienne Chang
    Sami Smalling
    Liz-Audrey Kounatse Djomnang
    Joan Sesing Lenz
    Randy Longman
    Amy Steadman
    Luis G. Gómez-Escobar
    Edward J. Schenck
    Mirella Salvatore
    Michael J. Satlin
    Manikkam Suthanthiran
    John R. Lee
    Christopher E. Mason
    Darshana Dadhania
    Iwijn De Vlaminck
    Nature Communications, 13
  • [7] Bootstrap confidence for molecular evolutionary estimates from tumor bulk sequencing data
    Huzar, Jared
    Shenoy, Madelyn
    Sanderford, Maxwell D.
    Kumar, Sudhir
    Miura, Sayaka
    FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [8] Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis
    Furtwaengler, Anja
    Reiter, Ella
    Neumann, Gunnar U.
    Siebke, Inga
    Steuri, Noah
    Hafner, Albert
    Loesch, Sandra
    Anthes, Nils
    Schuenemann, Verena J.
    Krause, Johannes
    SCIENTIFIC REPORTS, 2018, 8
  • [9] Ratio of mitochondrial to nuclear DNA affects contamination estimates in ancient DNA analysis
    Anja Furtwängler
    Ella Reiter
    Gunnar U. Neumann
    Inga Siebke
    Noah Steuri
    Albert Hafner
    Sandra Lösch
    Nils Anthes
    Verena J. Schuenemann
    Johannes Krause
    Scientific Reports, 8
  • [10] Indel detection from DNA and RNA sequencing data with transIndel
    Yang, Rendong
    Van Etten, Jamie L.
    Dehm, Scott M.
    BMC GENOMICS, 2018, 19