Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall

被引:1
|
作者
Harvey, William T. [1 ]
Ebert, Peter [2 ,3 ,4 ]
Ebler, Jana [2 ,4 ]
Audano, Peter A. [5 ]
Munson, Katherine M. [1 ]
Hoekzema, Kendra [1 ]
Porubsky, David [1 ]
Beck, Christine R. [5 ,6 ]
Marschall, Tobias [2 ,4 ]
Garimella, Kiran [7 ]
Eichler, Evan E. [1 ,8 ]
机构
[1] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[2] Heinrich Heine Univ, Inst Med Biometry & Bioinformat, Med Fac, D-40225 Dusseldorf, Germany
[3] Heinrich Heine Univ, Med Fac, Core Unit Bioinformat, D-40225 Dusseldorf, Germany
[4] Heinrich Heine Univ, Ctr Digital Med, D-40225 Dusseldorf, Germany
[5] Jackson Lab Genom Med, Farmington, CT 06032 USA
[6] Univ Connecticut, Hlth Ctr, Dept Genet & Genome Sci, Farmington, CT 06030 USA
[7] Broad Inst MIT & Harvard, Data Sci Platform, Cambridge, MA 02142 USA
[8] Univ Washington, Howard Hughes Med Inst, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
STRUCTURAL VARIATION;
D O I
10.1101/gr.278070.123
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.
引用
收藏
页码:2029 / 2040
页数:12
相关论文
共 50 条
  • [1] Long-read whole-genome sequencing for the genetic diagnosis of dystrophinopathies
    Xie, Zhiying
    Sun, Chengyue
    Zhang, Siwen
    Liu, Yilin
    Yu, Meng
    Zheng, Yiming
    Meng, Lingchao
    Acharya, Anushree
    Cornejo-Sanchez, Diana M.
    Wang, Gao
    Zhang, Wei
    Schrauwen, Isabelle
    Leal, Suzanne M.
    Wang, Zhaoxia
    Yuan, Yun
    [J]. ANNALS OF CLINICAL AND TRANSLATIONAL NEUROLOGY, 2020, 7 (10): : 2041 - 2046
  • [2] Evaluating alignment and variant-calling software for mutation identification in C. elegans by whole-genome sequencing
    Smith, Harold E.
    Yun, Sijung
    [J]. PLOS ONE, 2017, 12 (03):
  • [3] Improved Whole-Genome Sequence of Phytophthora capsici Generated by Long-Read Sequencing
    Shi, Jinxia
    Ye, Wenwu
    Ma, Dongfang
    Yin, Junliang
    Zhang, Zhichao
    Wang, Yuanchao
    Qiao, Yongli
    [J]. MOLECULAR PLANT-MICROBE INTERACTIONS, 2021, 34 (07) : 866 - 869
  • [4] New genomic features of the polled intersex syndrome variant in goats unraveled by long-read whole-genome sequencing
    Simon, R.
    Lischer, H. E. L.
    Pienkowska-Schelling, A.
    Keller, I.
    Hafliger, I. M.
    Letko, A.
    Schelling, C.
    Luehken, G.
    Drogemuller, C.
    [J]. ANIMAL GENETICS, 2020, 51 (03) : 439 - 448
  • [5] TRiCoLOR: tandem repeat profiling using whole-genome long-read sequencing data
    Bolognini, Davide
    Magi, Alberto
    Benes, Vladimir
    Korbel, Jan O.
    Rausch, Tobias
    [J]. GIGASCIENCE, 2020, 9 (10):
  • [6] Plant evolution and environmental adaptation unveiled by long-read whole-genome sequencing of Spirodela
    An, Dong
    Zhou, Yong
    Li, Changsheng
    Xiao, Qiao
    Wang, Tao
    Zhang, Yating
    Wu, Yongrui
    Li, Yubin
    Chao, Dai-Yin
    Messing, Joachim
    Wang, Wenqin
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2019, 116 (38) : 18893 - 18899
  • [7] Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases
    Mizuguchi, Takeshi
    Toyota, Tomoko
    Adachi, Hiroaki
    Miyake, Noriko
    Matsumoto, Naomichi
    Miyatake, Satoko
    [J]. JOURNAL OF HUMAN GENETICS, 2019, 64 (03) : 191 - 197
  • [8] Detecting a long insertion variant in SAMD12 by SMRT sequencing: implications of long-read whole-genome sequencing for repeat expansion diseases
    Takeshi Mizuguchi
    Tomoko Toyota
    Hiroaki Adachi
    Noriko Miyake
    Naomichi Matsumoto
    Satoko Miyatake
    [J]. Journal of Human Genetics, 2019, 64 : 191 - 197
  • [9] Long-read whole-genome analysis of human single cells
    Joanna Hård
    Jeff E. Mold
    Jesper Eisfeldt
    Christian Tellgren-Roth
    Susana Häggqvist
    Ignas Bunikis
    Orlando Contreras-Lopez
    Chen-Shan Chin
    Jessica Nordlund
    Carl-Johan Rubin
    Lars Feuk
    Jakob Michaëlsson
    Adam Ameur
    [J]. Nature Communications, 14
  • [10] Long-read whole-genome analysis of human single cells
    Hard, Joanna
    Mold, Jeff E.
    Eisfeldt, Jesper
    Tellgren-Roth, Christian
    Haggqvist, Susana
    Bunikis, Ignas
    Contreras-Lopez, Orlando
    Chin, Chen-Shan
    Nordlund, Jessica
    Rubin, Carl-Johan
    Feuk, Lars
    Michaelsson, Jakob
    Ameur, Adam
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)