Haplotype-aware Variant Selection for Genome Graphs

被引:1
|
作者
Tavakoli, Neda [1 ]
Gibney, Daniel [1 ]
Aluru, Srinivas [1 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Variation graphs; variant selection; haplotype-aware; SNPs; ILP-based optimization; FRAMEWORK; ALGORITHM; ALIGNMENT;
D O I
10.1145/3535508.3545556
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph-based genome representations have proven to be a powerful tool in genomic analysis due to their ability to encode variations found in multiple haplotypes and capture population genetic diversity. Such graphs also unavoidably contain paths which switch between haplotypes (i.e., recombinant paths) and thus do not fully match any of the constituent haplotypes. The number of such recombinant paths increases combinatorially with path length and cause inefficiencies and false positives when mapping reads. In this paper, we study the problem of finding reduced haplotypeaware genome graphs that incorporate only a selected subset of variants, yet contain paths corresponding to all alpha -long substrings of the input haplotypes (i.e., non-recombinant paths) with at most delta mismatches. Solving this problem optimally, i.e., minimizing the number of variants selected, is previously known to be NP-hard [14]. Here, we first establish several inapproximability results regarding finding haplotype-aware reduced variation graphs of optimal size. We then present an integer linear programming (ILP) formulation for solving the problem, and experimentally demonstrate this is a computationally feasible approach for real-world problems and provides far superior reduction compared to prior approaches.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes
    Teng Gao
    Ruslan Soldatov
    Hirak Sarkar
    Adam Kurkiewicz
    Evan Biederstedt
    Po-Ru Loh
    Peter V. Kharchenko
    Nature Biotechnology, 2023, 41 : 417 - 426
  • [32] Accurate sequence variant genotyping in cattle using variation-aware genome graphs
    Crysnanto, Danang
    Wurmser, Christine
    Pausch, Hubert
    GENETICS SELECTION EVOLUTION, 2019, 51 (1)
  • [33] Accurate sequence variant genotyping in cattle using variation-aware genome graphs
    Danang Crysnanto
    Christine Wurmser
    Hubert Pausch
    Genetics Selection Evolution, 51
  • [34] Pathogenic Variant Filtering for Mitochondrial Genome Haplotype Reporting
    Marshall, Charla
    Sturk-Andreaggi, Kimberly
    Ring, Joseph D.
    Duer, Arne
    Parson, Walther
    GENES, 2020, 11 (10) : 1 - 10
  • [35] THE ROLE OF HAPLOTYPE COMPLEMENTATION AND PURIFYING SELECTION IN THE GENOME EVOLUTION
    Cebrat, Stanislaw
    Waga, Wojciech
    Stauffer, Dietrich
    ADVANCES IN COMPLEX SYSTEMS, 2012, 15
  • [36] Positive selection footprints and haplotype distribution in the genome of dromedary camels
    Bahbahani, H.
    Alfoudari, A.
    Al-Ateeqi, A.
    Al Abri, M.
    Almathen, F.
    ANIMAL, 2024, 18 (03)
  • [37] Detecting recent positive selection in the human genome from haplotype structure
    Sabeti, PC
    Reich, DE
    Higgins, JM
    Levine, HZP
    Richter, DJ
    Schaffner, SF
    Gabriel, SB
    Platko, JV
    Patterson, NJ
    McDonald, GJ
    Ackerman, HC
    Campbell, SJ
    Altshuler, D
    Cooper, R
    Kwiatkowski, D
    Ward, R
    Lander, ES
    NATURE, 2002, 419 (6909) : 832 - 837
  • [38] NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks
    Mian Umair Ahsan
    Qian Liu
    Li Fang
    Kai Wang
    Genome Biology, 22
  • [39] Detecting recent positive selection in the human genome from haplotype structure
    Pardis C. Sabeti
    David E. Reich
    John M. Higgins
    Haninah Z. P. Levine
    Daniel J. Richter
    Stephen F. Schaffner
    Stacey B. Gabriel
    Jill V. Platko
    Nick J. Patterson
    Gavin J. McDonald
    Hans C. Ackerman
    Sarah J. Campbell
    David Altshuler
    Richard Cooper
    Dominic Kwiatkowski
    Ryk Ward
    Eric S. Lander
    Nature, 2002, 419 : 832 - 837
  • [40] NanoCaller for accurate detection of SNPs and indels in difficult-to-map regions from long-read sequencing by haplotype-aware deep neural networks
    Ahsan, Mian Umair
    Liu, Qian
    Fang, Li
    Wang, Kai
    GENOME BIOLOGY, 2021, 22 (01)