VISTA: an integrated framework for structural variant discovery

被引:0
|
作者
Sarwal, Varuni [1 ]
Lee, Seungmo [1 ]
Yang, Jianzhi [2 ]
Sankararaman, Sriram [1 ]
Chaisson, Mark [2 ]
Eskin, Eleazar [1 ]
Mangul, Serghei [2 ,3 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, 580 Portola Pl, Los Angeles, CA 90095 USA
[2] Univ Southern Calif, Dana & David Dornsife Coll Letters Arts & Sci, Dept Quantitat & Computat Biol, 3540 S Figueroa St, Los Angeles, CA 90089 USA
[3] Univ Southern Calif, Alfred E Mann Sch Pharm, Dept Clin Pharm, 1540 Alcazar St, Los Angeles, CA 90033 USA
关键词
bioinformatics; computational biology; machine learning; structural variation; COPY NUMBER VARIATION; DE-NOVO CNVS; PAIRED-END; GENOME;
D O I
10.1093/bib/bbae462
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Structural variation (SV) refers to insertions, deletions, inversions, and duplications in human genomes. SVs are present in approximately 1.5% of the human genome. Still, this small subset of genetic variation has been implicated in the pathogenesis of psoriasis, Crohn's disease and other autoimmune disorders, autism spectrum and other neurodevelopmental disorders, and schizophrenia. Since identifying structural variants is an important problem in genetics, several specialized computational techniques have been developed to detect structural variants directly from sequencing data. With advances in whole-genome sequencing (WGS) technologies, a plethora of SV detection methods have been developed. However, dissecting SVs from WGS data remains a challenge, with the majority of SV detection methods prone to a high false-positive rate, and no existing method able to precisely detect a full range of SVs present in a sample. Previous studies have shown that none of the existing SV callers can maintain high accuracy across various SV lengths and genomic coverages. Here, we report an integrated structural variant calling framework, Variant Identification and Structural Variant Analysis (VISTA), that leverages the results of individual callers using a novel and robust filtering and merging algorithm. In contrast to existing consensus-based tools which ignore the length and coverage, VISTA overcomes this limitation by executing various combinations of top-performing callers based on variant length and genomic coverage to generate SV events with high accuracy. We evaluated the performance of VISTA on comprehensive gold-standard datasets across varying organisms and coverage. We benchmarked VISTA using the Genome-in-a-Bottle gold standard SV set, haplotype-resolved de novo assemblies from the Human Pangenome Reference Consortium, along with an in-house polymerase chain reaction (PCR)-validated mouse gold standard set. VISTA maintained the highest F1 score among top consensus-based tools measured using a comprehensive gold standard across both mouse and human genomes. VISTA also has an optimized mode, where the calls can be optimized for precision or recall. VISTA-optimized can attain 100% precision and the highest sensitivity among other variant callers. In conclusion, VISTA represents a significant advancement in structural variant calling, offering a robust and accurate framework that outperforms existing consensus-based tools and sets a new standard for SV detection in genomic research.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] LUMPY: a probabilistic framework for structural variant discovery
    Ryan M Layer
    Colby Chiang
    Aaron R Quinlan
    Ira M Hall
    Genome Biology, 15
  • [2] LUMPY: a probabilistic framework for structural variant discovery
    Layer, Ryan M.
    Chiang, Colby
    Quinlan, Aaron R.
    Hall, Ira M.
    GENOME BIOLOGY, 2014, 15 (06):
  • [3] Cue: a deep-learning framework for structural variant discovery and genotyping
    Victoria Popic
    Chris Rohlicek
    Fabio Cunial
    Iman Hajirasouliha
    Dmitry Meleshko
    Kiran Garimella
    Anant Maheshwari
    Nature Methods, 2023, 20 : 559 - 568
  • [4] Cue: a deep-learning framework for structural variant discovery and genotyping
    Popic, Victoria
    Rohlicek, Chris
    Cunial, Fabio
    Hajirasouliha, Iman
    Meleshko, Dmitry
    Garimella, Kiran
    Maheshwari, Anant
    NATURE METHODS, 2023, 20 (04) : 559 - 568
  • [5] Integrated optimization capabilities in the VISTA technology CAD framework
    Plasun, R
    Stockinger, M
    Selberherr, S
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 1998, 17 (12) : 1244 - 1251
  • [6] AN INTEGRATED FRAMEWORK FOR EMPIRICAL DISCOVERY
    NORDHAUSEN, B
    LANGLEY, P
    MACHINE LEARNING, 1993, 12 (1-3) : 17 - 47
  • [7] DELLY: structural variant discovery by integrated paired-end and split-read analysis
    Rausch, Tobias
    Zichner, Thomas
    Schlattl, Andreas
    Stuetz, Adrian M.
    Benes, Vladimir
    Korbel, Jan O.
    BIOINFORMATICS, 2012, 28 (18) : I333 - I339
  • [8] A Novel Integrated Framework for Rare Variant Analysis
    Fouladi, Ramouna
    Bessonov, Kyrylo
    Van Lishout, Francois
    Moore, Jason
    Van Steen, Kristel
    HUMAN HEREDITY, 2013, 76 (02) : 93 - 93
  • [9] A survey of motif discovery methods in an integrated framework
    Sandve, Geir Kjetil
    Drablos, Finn
    BIOLOGY DIRECT, 2006, 1 (1)
  • [10] A survey of motif discovery methods in an integrated framework
    Geir Kjetil Sandve
    Finn Drabløs
    Biology Direct, 1