DeNovoCNN: a deep learning approach to de novo variant calling in next generation sequencing data

被引:9
|
作者
Khazeeva, Gelana [1 ]
Sablauskas, Karolis [2 ]
van der Sanden, Bart [3 ]
Steyaert, Wouter [1 ]
Kwint, Michael [1 ]
Rots, Dmitrijs [3 ]
Hinne, Max [4 ]
van Gerven, Marcel [4 ]
Yntema, Helger [3 ]
Vissers, Lisenka [3 ]
Gilissen, Christian [1 ]
机构
[1] Radboud Univ Nijmegen, Radboud Inst Mol Life Sci, Dept Human Genet, Med Ctr, Geert Grootepl 10, NL-6525 GA Nijmegen, Netherlands
[2] Vilnius Univ, Inst Clin Med, Fac Med, Vilnius, Lithuania
[3] Radboud Univ Nijmegen, Dept Human Genet, Donders Ctr Neurosci, Med Ctr, Geert Grootepl 10, NL-6525 GA Nijmegen, Netherlands
[4] Radboud Univ Nijmegen, Donders Inst Brain Cognit & Behav, Nijmegen, Netherlands
关键词
DISCOVERY; FRAMEWORK;
D O I
10.1093/nar/gkac511
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
De novo mutations (DNMs) are an important cause of genetic disorders. The accurate identification of DNMs from sequencing data is therefore fundamental to rare disease research and diagnostics. Unfortunately, identifying reliable DNMs remains a major challenge due to sequence errors, uneven coverage, and mapping artifacts. Here, we developed a deep convolutional neural network (CNN) DNM caller (DeNovoCNN), that encodes the alignment of sequence reads for a trio as 160 x 164 resolution images. DeNovoCNN was trained on DNMs of 5616 whole exome sequencing (WES) trios achieving total 96.74% recall and 96.55% precision on the test dataset. We find that DeNovoCNN has increased recall/sensitivity and precision compared to existing DNM calling approaches (GATK, DeNovoGear, DeepTrio, Samtools) based on the Genome in a Bottle reference dataset and independent WES and WGS trios. Validations of DNMs based on Sanger and PacBio HiFi sequencing confirm that DeNovoCNN outperforms existing methods. Most importantly, our results suggest that DeNovoCNN is likely robust against different exome sequencing and analyses approaches, thereby allowing the application on other datasets. DeNovoCNN is freely available as a Docker container and can be run on existing alignment (BAM/CRAM) and variant calling (VCF) files from WES and WGS without a need for variant recalling.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] DeNovoCNN: A deep learning approach to de novo variant calling in next generation sequencing data
    Khazeeva, Gelana
    Sablauskas, Karolis
    Van der Sanden, Bart
    Steyaert, Wouter
    Kwint, Michael
    Rots, Dmitrijs
    Hinne, Max
    van Gerven, Marcel
    Yntema, Helger
    Vissers, Lisenka
    Gilissen, Christian
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 662 - 662
  • [2] Methodical comparison of variant calling pipelines for next generation sequencing data
    Bobbili, D. R.
    Sturm, M.
    May, P.
    Sharma, M.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 721 - 721
  • [3] De Novo Assembly Methods for Next Generation Sequencing Data
    He, Yiming
    Zhang, Zhen
    Peng, Xiaoqing
    Wu, Fangxiang
    Wang, Jianxin
    [J]. TSINGHUA SCIENCE AND TECHNOLOGY, 2013, 18 (05) : 500 - 514
  • [4] De Novo Assembly Methods for Next Generation Sequencing Data
    Yiming He
    Zhen Zhang
    Xiaoqing Peng
    Fangxiang Wu
    Jianxin Wang
    [J]. Tsinghua Science and Technology, 2013, 18 (05) : 500 - 514
  • [5] A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data
    Ainscough, Benjamin J.
    Barnell, Erica K.
    Ronning, Peter
    Campbell, Katie M.
    Wagner, Alex H.
    Fehniger, Todd A.
    Dunn, Gavin P.
    Uppaluri, Ravindra
    Govindan, Ramaswamy
    Rohan, Thomas E.
    Griffith, Malachi
    Mardis, Elaine R.
    Swamidass, S. Joshua
    Griffith, Obi L.
    [J]. NATURE GENETICS, 2018, 50 (12) : 1735 - +
  • [6] A deep learning approach to automate refinement of somatic variant calling from cancer sequencing data
    Benjamin J. Ainscough
    Erica K. Barnell
    Peter Ronning
    Katie M. Campbell
    Alex H. Wagner
    Todd A. Fehniger
    Gavin P. Dunn
    Ravindra Uppaluri
    Ramaswamy Govindan
    Thomas E. Rohan
    Malachi Griffith
    Elaine R. Mardis
    S. Joshua Swamidass
    Obi L. Griffith
    [J]. Nature Genetics, 2018, 50 : 1735 - 1743
  • [7] De novo sequencing and variant calling with nanopores using PoreSeq
    Szalay, Tamas
    Golovchenko, Jene A.
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (10) : 1087 - +
  • [8] De novo sequencing and variant calling with nanopores using PoreSeq
    Tamas Szalay
    Jene A Golovchenko
    [J]. Nature Biotechnology, 2015, 33 : 1087 - 1091
  • [9] Validating DeepVariant: Assessment of a deep learning variant caller in Targeted Next Generation Sequencing data
    Loupis, T.
    Vrachnos, D.
    Zoi, K.
    Thanos, D.
    Makrythanasis, P.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2020, 28 (SUPPL 1) : 968 - 968
  • [10] Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data
    Sarah Sandmann
    Aniek O. de Graaf
    Mohsen Karimi
    Bert A. van der Reijden
    Eva Hellström-Lindberg
    Joop H. Jansen
    Martin Dugas
    [J]. Scientific Reports, 7