Fast Estimation of Recombination Rates Using Topological Data Analysis

被引:16
|
作者
Humphreys, Devon P. [1 ]
McGuirl, Melissa R. [3 ]
Miyagi, Michael [4 ]
Blumberg, Andrew J. [2 ]
机构
[1] Univ Texas Austin, Dept Integrat Biol, Austin, TX 78712 USA
[2] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
[3] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[4] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
recombination; topological data analysis; coalescent theory; population genetics; HIGH-RESOLUTION; NUMBER; PRDM9; EVENTS; SAMPLE;
D O I
10.1534/genetics.118.301565
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (beta 1) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call psi, with a natural connection to coalescent models, and present novel arguments relating beta 1 to population genetic models. Using simulations, we show that psi and beta 1 are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE's efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
引用
收藏
页码:1191 / 1204
页数:14
相关论文
共 50 条
  • [41] Bayesian inference of fine-scale recombination rates using population genomic data
    Wang, Ying
    Rannala, Bruce
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2008, 363 (1512) : 3921 - 3930
  • [42] Rate estimation, using forward adaptive quantization: H.264 fast intra mode selection at high data rates
    Minoo, Koohyar
    Nguyen, Truong Q.
    CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, 2007, : 235 - 238
  • [43] Analysis of fibrin networks using topological data analysis - a feasibility study
    Berger, Martin
    Hell, Tobias
    Tobiasch, Anna
    Martini, Judith
    Lindner, Andrea
    Tauber, Helmuth
    Bachler, Mirjam
    Hermann, Martin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [44] Towards Analysis of Multivariate Time Series Using Topological Data Analysis
    Zheng, Jingyi
    Feng, Ziqin
    Ekstrom, Arne D.
    MATHEMATICS, 2024, 12 (11)
  • [45] Topological Signatures For Fast Mobility Analysis
    Ghosh, Abhirup
    Rozemberczki, Benedek
    Ramamoorthy, Subramanian
    Sarkar, Rik
    26TH ACM SIGSPATIAL INTERNATIONAL CONFERENCE ON ADVANCES IN GEOGRAPHIC INFORMATION SYSTEMS (ACM SIGSPATIAL GIS 2018), 2018, : 159 - 168
  • [46] Estimation of rates of recombination and migration in populations of plant pathogens - A reply
    Zhan, J
    Mundt, CC
    McDonald, BA
    PHYTOPATHOLOGY, 2000, 90 (04) : 324 - 326
  • [47] Update on estimation of mutation rates using data from fluctuation experiments
    Zheng, Q
    GENETICS, 2005, 171 (02) : 861 - 864
  • [48] State estimation for networked control systems using fixed data rates
    Liu, Qing-Quan
    Jin, Fang
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2017, 48 (09) : 1818 - 1828
  • [49] TangleSolve:: topological analysis of site-specific recombination
    Saka, Y
    Vázquez, M
    BIOINFORMATICS, 2002, 18 (07) : 1011 - 1012
  • [50] Inference of Microbial Recombination Rates from Metagenomic Data
    Johnson, Philip L. F.
    Slatkin, Montgomery
    PLOS GENETICS, 2009, 5 (10):