Fast Estimation of Recombination Rates Using Topological Data Analysis

被引:16
|
作者
Humphreys, Devon P. [1 ]
McGuirl, Melissa R. [3 ]
Miyagi, Michael [4 ]
Blumberg, Andrew J. [2 ]
机构
[1] Univ Texas Austin, Dept Integrat Biol, Austin, TX 78712 USA
[2] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
[3] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[4] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
recombination; topological data analysis; coalescent theory; population genetics; HIGH-RESOLUTION; NUMBER; PRDM9; EVENTS; SAMPLE;
D O I
10.1534/genetics.118.301565
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (beta 1) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call psi, with a natural connection to coalescent models, and present novel arguments relating beta 1 to population genetic models. Using simulations, we show that psi and beta 1 are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE's efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
引用
收藏
页码:1191 / 1204
页数:14
相关论文
共 50 条
  • [22] Movie Genre Detection Using Topological Data Analysis
    Doshi, Pratik
    Zadrozny, Wlodek
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2018, 2018, 11171 : 117 - 128
  • [23] Cloud Detection and Characterization using Topological Data Analysis
    Guiang, Chona S.
    Levine, Robert Y.
    REMOTE SENSING OF CLOUDS AND THE ATMOSPHERE XVII; AND LIDAR TECHNOLOGIES, TECHNIQUES, AND MEASUREMENTS FOR ATMOSPHERIC REMOTE SENSING VIII, 2012, 8534
  • [24] Generating an Agent Taxonomy using Topological Data Analysis
    Swarup, Samarth
    Rezazadegan, Reza
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2204 - 2205
  • [25] Using Topological Data Analysis to Visualize Instrument Output
    Chukanov S.N.
    Chukanov I.S.
    Scientific Visualization, 2023, 15 (02): : 11 - 21
  • [26] Lean blowout detection using topological data analysis
    Bhattacharya, Arijit
    Mondal, Sabyasachi
    De, Somnath
    Mukhopadhyay, Achintya
    Sen, Swarnendu
    CHAOS, 2024, 34 (01)
  • [27] Statistical Topological Data Analysis using Persistence Landscapes
    Bubenik, Peter
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 77 - 102
  • [28] Exploring geographic hotspots using topological data analysis
    Zhang, Rui
    Lukasczyk, Jonas
    Wang, Feng
    Ebert, David
    Shakarian, Paulo
    Mack, Elizabeth A.
    Maciejewski, Ross
    TRANSACTIONS IN GIS, 2021, 25 (06) : 3188 - 3209
  • [29] On the Topological Analysis of Industrial Process Data Using the SOM
    Corona, Francesco
    Mulas, Michela
    Baratti, Roberto
    Romagnoli, Jose
    10TH INTERNATIONAL SYMPOSIUM ON PROCESS SYSTEMS ENGINEERING, 2009, 27 : 1173 - 1178
  • [30] Very fast global motion estimation using partial data
    Alzoubi, Hussein
    Pan, W. David
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PTS 1-3, PROCEEDINGS, 2007, : 1189 - 1192