Haplotype-aware Variant Selection for Genome Graphs

被引:1
|
作者
Tavakoli, Neda [1 ]
Gibney, Daniel [1 ]
Aluru, Srinivas [1 ]
机构
[1] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Variation graphs; variant selection; haplotype-aware; SNPs; ILP-based optimization; FRAMEWORK; ALGORITHM; ALIGNMENT;
D O I
10.1145/3535508.3545556
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph-based genome representations have proven to be a powerful tool in genomic analysis due to their ability to encode variations found in multiple haplotypes and capture population genetic diversity. Such graphs also unavoidably contain paths which switch between haplotypes (i.e., recombinant paths) and thus do not fully match any of the constituent haplotypes. The number of such recombinant paths increases combinatorially with path length and cause inefficiencies and false positives when mapping reads. In this paper, we study the problem of finding reduced haplotypeaware genome graphs that incorporate only a selected subset of variants, yet contain paths corresponding to all alpha -long substrings of the input haplotypes (i.e., non-recombinant paths) with at most delta mismatches. Solving this problem optimally, i.e., minimizing the number of variants selected, is previously known to be NP-hard [14]. Here, we first establish several inapproximability results regarding finding haplotype-aware reduced variation graphs of optimal size. We then present an integer linear programming (ILP) formulation for solving the problem, and experimentally demonstrate this is a computationally feasible approach for real-world problems and provides far superior reduction compared to prior approaches.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Haplotype-Aware Sequence Alignment to Pangenome Graphs
    Chandra, Ghanshyam
    Gibney, Daniel
    Jain, Chirag
    RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY, RECOMB 2024, 2024, 14758 : 381 - 384
  • [2] BCFtools/csq: haplotype-aware variant consequences
    Danecek, Petr
    McCarthy, Shane A.
    BIOINFORMATICS, 2017, 33 (13) : 2037 - 2039
  • [3] CHOP: haplotype-aware path indexing in population graphs
    Mokveld, Tom
    Linthorst, Jasper
    Al-Ars, Zaid
    Holstege, Henne
    Reinders, Marcel
    GENOME BIOLOGY, 2020, 21 (01)
  • [4] CHOP: haplotype-aware path indexing in population graphs
    Tom Mokveld
    Jasper Linthorst
    Zaid Al-Ars
    Henne Holstege
    Marcel Reinders
    Genome Biology, 21
  • [5] Haplotype-aware pantranscriptome analyses using spliced pangenome graphs
    Jonas A. Sibbesen
    Jordan M. Eizenga
    Adam M. Novak
    Jouni Sirén
    Xian Chang
    Erik Garrison
    Benedict Paten
    Nature Methods, 2023, 20 : 239 - 247
  • [7] Haplotype-aware pantranscriptome analyses using spliced pangenome graphs
    Sibbesen, Jonas A.
    Eizenga, Jordan M.
    Novak, Adam M.
    Siren, Jouni
    Chang, Xian
    Garrison, Erik
    Paten, Benedict
    NATURE METHODS, 2023, 20 (02) : 239 - +
  • [8] Haplotype-aware graph indexes
    Siren, Jouni
    Garrison, Erik
    Novak, Adam M.
    Paten, Benedict
    Durbin, Richard
    BIOINFORMATICS, 2020, 36 (02) : 400 - 407
  • [9] Hapo-G, haplotype-aware polishing of genome assemblies with accurate reads
    Aury, Jean-Marc
    Istace, Benjamin
    NAR GENOMICS AND BIOINFORMATICS, 2021, 3 (02)
  • [10] HapKled: a haplotype-aware structural variant calling approach for Oxford nanopore sequencing data
    Zhang, Zhendong
    Liu, Yue
    Li, Xin
    Liu, Yadong
    Wang, Yadong
    Jiang, Tao
    FRONTIERS IN GENETICS, 2024, 15