Fast Estimation of Recombination Rates Using Topological Data Analysis

被引:16
|
作者
Humphreys, Devon P. [1 ]
McGuirl, Melissa R. [3 ]
Miyagi, Michael [4 ]
Blumberg, Andrew J. [2 ]
机构
[1] Univ Texas Austin, Dept Integrat Biol, Austin, TX 78712 USA
[2] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
[3] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[4] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
recombination; topological data analysis; coalescent theory; population genetics; HIGH-RESOLUTION; NUMBER; PRDM9; EVENTS; SAMPLE;
D O I
10.1534/genetics.118.301565
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (beta 1) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call psi, with a natural connection to coalescent models, and present novel arguments relating beta 1 to population genetic models. Using simulations, we show that psi and beta 1 are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE's efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
引用
收藏
页码:1191 / 1204
页数:14
相关论文
共 50 条
  • [31] Fast estimation of Bifurcation Conditions using Noisy Response Data
    Miller, Nicholas
    Burgner, Chris
    Dykman, Mark
    Shaw, Steven
    Turner, Kimberly
    SENSORS AND SMART STRUCTURES TECHNOLOGIES FOR CIVIL, MECHANICAL, AND AEROSPACE SYSTEMS 2010, 2010, 7647
  • [32] Topological data analysis
    Epstein, Charles
    Carlsson, Gunnar
    Edelsbrunner, Herbert
    INVERSE PROBLEMS, 2011, 27 (12)
  • [33] Topological Data Analysis
    Reinhard Laubenbacher
    Alan Hastings
    Bulletin of Mathematical Biology, 2019, 81 : 2051 - 2051
  • [34] Topological data analysis
    Oliver Graydon
    Nature Photonics, 2018, 12 : 189 - 189
  • [35] Topological Data Structure: The Fast Marching Example
    Toujja, Sofian
    Bay, Thierry
    Belhaouari, Hakim
    Fuchs, Laurent
    PROCEEDINGS OF THE 18TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VISIGRAPP 2023, 2023, : 206 - 213
  • [36] Topological Data Analysis
    Zomorodian, Afra
    ADVANCES IN APPLIED AND COMPUTATIONAL TOPOLOGY, 2012, 70 : 1 - 39
  • [37] Topological Data Analysis
    Wasserman, Larry
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 5, 2018, 5 : 501 - 532
  • [38] Topological analysis of data
    Alice Patania
    Francesco Vaccarino
    Giovanni Petri
    EPJ Data Science, 6
  • [39] Topological analysis of data
    Patania, Alice
    Vaccarino, Francesco
    Petri, Giovanni
    EPJ DATA SCIENCE, 2017, 6
  • [40] Topological Data Analysis
    Laubenbacher, Reinhard
    Hastings, Alan
    BULLETIN OF MATHEMATICAL BIOLOGY, 2019, 81 (07) : 2051 - 2051