NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

被引:20
|
作者
Fang, Li [1 ,2 ,3 ]
Hu, Jiang [1 ]
Wang, Depeng [1 ]
Wang, Kai [2 ,3 ,4 ,5 ]
机构
[1] Grandomics Biosci, Beijing 102206, Peoples R China
[2] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[4] Columbia Univ, Dept Biomed Informat, Med Ctr, New York, NY 10032 USA
[5] Columbia Univ, Inst Genom Med, Med Ctr, New York, NY 10032 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Long-read sequencing; Structural variants; Low coverage; PacBio; DE-NOVO MUTATIONS; HUMAN GENOME; DISEASE; MECHANISMS; CANCER;
D O I
10.1186/s12859-018-2207-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results: In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that similar to 10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions: Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data
    Li Fang
    Jiang Hu
    Depeng Wang
    Kai Wang
    [J]. BMC Bioinformatics, 19
  • [2] A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
    Mian Umair Ahsan
    Qian Liu
    Jonathan Elliot Perdomo
    Li Fang
    Kai Wang
    [J]. Nature Methods, 2023, 20 : 1143 - 1158
  • [3] A survey of algorithms for the detection of genomic structural variants from long-read sequencing data
    Ahsan, Mian Umair
    Liu, Qian
    Perdomo, Jonathan Elliot
    Fang, Li
    Wang, Kai
    [J]. NATURE METHODS, 2023, 20 (08) : 1143 - 1158
  • [4] Rapid Low-Cost Assembly of the Drosophila melanogaster Reference Genome Using Low-Coverage, Long-Read Sequencing
    Solares, Edwin A.
    Chakraborty, Mahul
    Miller, Danny E.
    Kalsow, Shannon
    Hall, Kate
    Perera, Anoja G.
    Emerson, J. J.
    Hawley, R. Scott
    [J]. G3-GENES GENOMES GENETICS, 2018, 8 (10): : 3143 - 3154
  • [5] Detecting Pathogenic Structural Variants with Low-Coverage PacBio Sequencing
    Hickey, L.
    Wenger, A. M.
    Baybayan, P.
    Peluso, P.
    Korlach, J.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 729 - 729
  • [6] Detecting inherited and novel structural variants in low-coverage parent-child sequencing data
    Spence, Melissa
    Banuelos, Mario
    Marcia, Roummel F.
    Sindi, Suzanne
    [J]. METHODS, 2020, 173 : 61 - 68
  • [7] NanoSNP: a progressive and haplotype-aware SNP caller on low-coverage nanopore sequencing data
    Huang, Neng
    Xu, Minghua
    Nie, Fan
    Ni, Peng
    Xiao, Chuan-Le
    Luo, Feng
    Wang, Jianxin
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [8] Screening for causative structural variants in neurological disorders using long-read sequencing
    Ekholm, J.
    Kujawa, S.
    Tsai, Y.
    Greenberg, D.
    Hon, T.
    Eng, K.
    Wenger, A.
    Tseng, E.
    Wang, J.
    Jarosz, M.
    Giorda, K.
    Clark, T.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 671 - 672
  • [9] Comparison and benchmark of structural variants detected from long read and long-read assembly
    Lin, Jiadong
    Jia, Peng
    Wang, Songbo
    Kosters, Walter
    Ye, Kai
    [J]. BRIEFINGS IN BIOINFORMATICS, 2023, 24 (04)
  • [10] Application of long-read sequencing to the detection of structural variants in human cancer genomes
    Sakamoto, Yoshitaka
    Zaha, Suzuko
    Suzuki, Yutaka
    Seki, Masahide
    Suzuki, Ayako
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 4207 - 4216