NextSV: a meta-caller for structural variants from low-coverage long-read sequencing data

被引:20
|
作者
Fang, Li [1 ,2 ,3 ]
Hu, Jiang [1 ]
Wang, Depeng [1 ]
Wang, Kai [2 ,3 ,4 ,5 ]
机构
[1] Grandomics Biosci, Beijing 102206, Peoples R China
[2] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[3] Univ Penn, Dept Pathol & Lab Med, Perelman Sch Med, Philadelphia, PA 19104 USA
[4] Columbia Univ, Dept Biomed Informat, Med Ctr, New York, NY 10032 USA
[5] Columbia Univ, Inst Genom Med, Med Ctr, New York, NY 10032 USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
Long-read sequencing; Structural variants; Low coverage; PacBio; DE-NOVO MUTATIONS; HUMAN GENOME; DISEASE; MECHANISMS; CANCER;
D O I
10.1186/s12859-018-2207-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Structural variants (SVs) in human genomes are implicated in a variety of human diseases. Long-read sequencing delivers much longer read lengths than short-read sequencing and may greatly improve SV detection. However, due to the relatively high cost of long-read sequencing, it is unclear what coverage is needed and how to optimally use the aligners and SV callers. Results: In this study, we developed NextSV, a meta-caller to perform SV calling from low coverage long-read sequencing data. NextSV integrates three aligners and three SV callers and generates two integrated call sets (sensitive/stringent) for different analysis purposes. We evaluated SV calling performance of NextSV under different PacBio coverages on two personal genomes, NA12878 and HX1. Our results showed that, compared with running any single SV caller, NextSV stringent call set had higher precision and balanced accuracy (F1 score) while NextSV sensitive call set had a higher recall. At 10X coverage, the recall of NextSV sensitive call set was 93.5 to 94.1% for deletions and 87.9 to 93.2% for insertions, indicating that similar to 10X coverage might be an optimal coverage to use in practice, considering the balance between the sequencing costs and the recall rates. We further evaluated the Mendelian errors on an Ashkenazi Jewish trio dataset. Conclusions: Our results provide useful guidelines for SV detection from low coverage whole-genome PacBio data and we expect that NextSV will facilitate the analysis of SVs on long-read sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Duan, Xiaoke
    Pan, Mingpei
    Fan, Shaohua
    [J]. BMC GENOMICS, 2022, 23 (01)
  • [32] Comprehensive evaluation of structural variant genotyping methods based on long-read sequencing data
    Xiaoke Duan
    Mingpei Pan
    Shaohua Fan
    [J]. BMC Genomics, 23
  • [33] Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data
    Thomas A. Delomas
    Stuart C. Willis
    [J]. BMC Bioinformatics, 24
  • [34] Estimating microhaplotype allele frequencies from low-coverage or pooled sequencing data
    Delomas, Thomas A.
    Willis, Stuart C.
    [J]. BMC BIOINFORMATICS, 2023, 24 (01)
  • [35] Characterizing Bias in Population Genetic Inferences from Low-Coverage Sequencing Data
    Han, Eunjung
    Sinsheimer, Janet S.
    Novembre, John
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2014, 31 (03) : 723 - 735
  • [36] Long-read sequencing reveals heritable large structural variants induced by CRISPR-Cas9
    Hoijer, Ida
    Emmanouilidou, Anastasia
    Feuk, Lars
    Gyllensten, Ulf
    den Hoed, Marcel
    Ameur, Adam
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 4 - 4
  • [37] A beginner's guide to assembling a draft genome and analyzing structural variants with long-read sequencing technologies
    Kim, Jun
    Kim, Chuna
    [J]. STAR PROTOCOLS, 2022, 3 (03):
  • [38] Long-Read Sequencing Identifies the First Retrotransposon Insertion and Resolves Structural Variants Causing Antithrombin Deficiency
    De La Morena-Barrio, Belen
    Stephens, Jonathan
    Eugenia De La Morena-Barrio, Maria
    Stefanucci, Luca
    Padilla, Jose
    Minano, Antonia
    Gleadall, Nicholas
    Luis Garcia, Juan
    Fernanda Lopez-Fernandez, Maria
    Morange, Pierre-Emmanuel
    Puurunen, Marja
    Undas, Anetta
    Vidal, Francisco
    Raymond, Frances Lucy
    Vicente, Vicente
    Ouwehand, Willem H.
    Corral, Javier
    Sanchis-Juan, Alba
    [J]. THROMBOSIS AND HAEMOSTASIS, 2022, 122 (08) : 1369 - 1378
  • [39] SvAnna: efficient and accurate pathogenicity prediction of coding and regulatory structural variants in long-read genome sequencing
    Danis, Daniel
    Jacobsen, Julius O. B.
    Balachandran, Parithi
    Zhu, Qihui
    Yilmaz, Feyza
    Reese, Justin
    Haimel, Matthias
    Lyon, Gholson J.
    Helbig, Ingo
    Mungall, Christopher J.
    Beck, Christine R.
    Lee, Charles
    Smedley, Damian
    Robinson, Peter N.
    [J]. GENOME MEDICINE, 2022, 14 (01)
  • [40] Pangenome obtained by long-read sequencing of 11 genomes reveal hidden functional structural variants in pigs
    Jiang, Yi-Fan
    Wang, Sheng
    Wang, Chong-Long
    Xu, Ru-Hai
    Wang, Wen-Wen
    Jiang, Yao
    Wang, Ming-Shan
    Jiang, Li
    Dai, Li-He
    Wang, Jie-Ru
    Chu, Xiao-Hong
    Zeng, Yong-Qing
    Fang, Ling-Zhao
    Wu, Dong-Dong
    Zhang, Qin
    Ding, Xiang-Dong
    [J]. ISCIENCE, 2023, 26 (03)