SVsearcher: A more accurate structural variation detection method in long read data

被引:3
|
作者
Zheng, Yan [1 ]
Shang, Xuequn [1 ]
Sung, Wing-Kin [2 ,3 ,4 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, West Youyi Rd 127, Xian 710072, Peoples R China
[2] Chinese Univ Hong Kong, Dept Chem Pathol, Hong Kong, Peoples R China
[3] Hong Kong Genome Inst, Shatin, Hong Kong Sci Pk, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Li Ka Shing Inst Hlth Sci, Lab Computat Genom, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Long-read sequencing data; Structural variations; SV detection; PAIRED-END; VARIANTS; IMPACT; INDELS; CANCER;
D O I
10.1016/j.compbiomed.2023.106843
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Structural variations (SVs) represent genomic rearrangements (such as deletions, insertions, and inversions) whose sizes are larger than 50bp. They play important roles in genetic diseases and evolution mechanism. Due to the advance of long-read sequencing (i.e. PacBio long-read sequencing and Oxford Nanopore (ONT) long-read sequencing), we can call SVs accurately. However, for ONT long reads, we observe that existing long read SV callers miss a lot of true SVs and call a lot of false SVs in repetitive regions and in regions with multi-allelic SVs. Those errors are caused by messy alignments of ONT reads due to their high error rate. Hence, we propose a novel method, SVsearcher, to solve these issues. We run SVsearcher and other callers in three real datasets and find that SVsearcher improves the F1 score by approximately 10% for high coverage (50x) datasets and more than 25% for low coverage (10x) datasets. More importantly, SVsearcher can identify 81.7%-91.8% multi-allelic SVs while existing methods only identify 13.2% (Sniffles)-54.0% (nanoSV) of them. SVsearcher is available at https://github.com/kensung-lab/SVsearcher.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [31] An Accurate and Automated Convective Vortex Detection Method for Long-Duration Infrasound Microbarometer Data
    Berg, Elizabeth M.
    Urtecho, Louis J.
    Krishnamoorthy, Siddharth
    Silber, Elizabeth A.
    Sparks, Andrew
    Bowman, Daniel C.
    JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY, 2024, 41 (03) : 341 - 354
  • [32] Highly accurate long-read HiFi sequencing data for five complex genomes
    Ting Hon
    Kristin Mars
    Greg Young
    Yu-Chih Tsai
    Joseph W. Karalius
    Jane M. Landolin
    Nicholas Maurer
    David Kudrna
    Michael A. Hardigan
    Cynthia C. Steiner
    Steven J. Knapp
    Doreen Ware
    Beth Shapiro
    Paul Peluso
    David R. Rank
    Scientific Data, 7
  • [33] Highly accurate long-read HiFi sequencing data for five complex genomes
    Hon, Ting
    Mars, Kristin
    Young, Greg
    Tsai, Yu-Chih
    Karalius, Joseph W.
    Landolin, Jane M.
    Maurer, Nicholas
    Kudrna, David
    Hardigan, Michael A.
    Steiner, Cynthia C.
    Knapp, Steven J.
    Ware, Doreen
    Shapiro, Beth
    Peluso, Paul
    Rank, David R.
    SCIENTIFIC DATA, 2020, 7 (01)
  • [34] HAT: de novo variant calling for highly accurate short-read and long-read sequencing data
    Ng, Jeffrey K.
    Turner, Tychele N.
    BIOINFORMATICS, 2024, 40 (01)
  • [35] in silico Long-Read Sequencing from FFPE Solid Tumor Tissue for Structural Variation Detection and Phasing in Archival Specimens
    Costa, H. A.
    Blanchette, M.
    Bustamante, C. D.
    Green, R. E.
    Hadley, P. D.
    Kunder, C.
    Putnam, N.
    Rice, B.
    Trolf, C.
    Zehnder, J. L.
    JOURNAL OF MOLECULAR DIAGNOSTICS, 2017, 19 (06): : 1034 - 1035
  • [36] Long-read genome sequencing identifies causal structural variation in a Mendelian disease
    Merker, Jason D.
    Wenger, Aaron M.
    Sneddon, Tam
    Grove, Megan
    Zappala, Zachary
    Fresard, Laure
    Waggott, Daryl
    Utiramerur, Sowmi
    Hou, Yanli
    Smith, Kevin S.
    Montgomery, Stephen B.
    Wheeler, Matthew
    Buchan, Jillian G.
    Lambert, Christine C.
    Eng, Kevin S.
    Hickey, Luke
    Korlach, Jonas
    Ford, James
    Ashley, Euan A.
    GENETICS IN MEDICINE, 2018, 20 (01) : 159 - 163
  • [37] Population-scale genotyping of structural variation in the era of long-read sequencing
    Quan, Cheng
    Lu, Hao
    Lu, Yiming
    Zhou, Gangqiao
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 2639 - 2647
  • [38] Unmasking small and structural variation in the IKBKG gene with short and long read sequencing technologies
    Munoz-Barrera, Adrian
    Garcia-Olivares, Victor
    Rubio-Rodriguez, Luis A.
    Lorenzo-Salazar, Jose M.
    Gonzalez-Montelongo, Rafaela
    Flores, Carlos
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 270 - 270
  • [39] Structural Variation Detection with Read Pair Information: An Improved Null Hypothesis Reduces Bias
    Sahlin, Kristoffer
    Franberg, Mattias
    Arvestad, Lars
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (06) : 581 - 589
  • [40] The impact of FASTQ and alignment read order on structural variant calling from long-read sequencing data
    Lesack, Kyle J.
    Wasmuth, James D.
    PEERJ, 2024, 12 : 1 - 19