Fast Estimation of Recombination Rates Using Topological Data Analysis

被引:16
|
作者
Humphreys, Devon P. [1 ]
McGuirl, Melissa R. [3 ]
Miyagi, Michael [4 ]
Blumberg, Andrew J. [2 ]
机构
[1] Univ Texas Austin, Dept Integrat Biol, Austin, TX 78712 USA
[2] Univ Texas Austin, Dept Math, Austin, TX 78712 USA
[3] Brown Univ, Div Appl Math, Providence, RI 02912 USA
[4] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
recombination; topological data analysis; coalescent theory; population genetics; HIGH-RESOLUTION; NUMBER; PRDM9; EVENTS; SAMPLE;
D O I
10.1534/genetics.118.301565
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number (beta 1) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call psi, with a natural connection to coalescent models, and present novel arguments relating beta 1 to population genetic models. Using simulations, we show that psi and beta 1 are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE's efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
引用
收藏
页码:1191 / 1204
页数:14
相关论文
共 50 条
  • [1] Fast Estimation of Recombination Rates Using Topological Data Analysis (vol 211, pg 1191, 2019)
    Humphreys, Devon P.
    McGuirl, Melissa R.
    Miyagi, Miriam
    Blumberg, Andrew J.
    GENETICS, 2022, 220 (02)
  • [2] Convergence Rates for Persistence Diagram Estimation in Topological Data Analysis
    Chazal, Frederic
    Glisse, Marc
    Labruere, Catherine
    Michel, Bertrand
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 3603 - 3635
  • [3] Convergence rates for persistence diagram estimation in Topological Data Analysis
    Chazal, Frederic
    Glisse, Marc
    Labruere, Catherine
    Michel, Bertrand
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [4] Convergence rates for persistence diagram estimation in topological data analysis
    Inria Saclay - Île de France, Bâtiment Alan Turing, Campus de l'É Cole Polytechnique, 1 rue Honoré d'Estienne d'Orves, Palaiseau
    91120, France
    不详
    21078, France
    不详
    75005, France
    J. Mach. Learn. Res., (3603-3635): : 3603 - 3635
  • [5] Maximum likelihood estimation of recombination rates from population data
    Kuhner, MK
    Yamato, J
    Felsenstein, J
    GENETICS, 2000, 156 (03) : 1393 - 1401
  • [6] New Software for the Fast Estimation of Population Recombination Rates (FastEPRR) in the Genomic Era
    Gao, Feng
    Ming, Chen
    Hu, Wangjie
    Li, Haipeng
    G3-GENES GENOMES GENETICS, 2016, 6 (06): : 1563 - 1571
  • [7] Fast and Accurate Estimation of Species-Specific Diversification Rates Using Data Augmentation
    Maliet, Odile
    Morlon, Helene
    SYSTEMATIC BIOLOGY, 2022, 71 (02) : 353 - 366
  • [8] Inference of Ancestral Recombination Graphs through Topological Data Analysis
    Camara, Pablo G.
    Levine, Arnold J.
    Rabadan, Raul
    PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (08)
  • [9] Free Energy Estimation of Metastable Structures of Block Copolymers using Topological Data Analysis
    Mototake, Yoh-ichi
    Yamanaka, Sadato
    Aoyagi, Takeshi
    Ohnishi, Takaaki
    JOURNAL OF COMPUTER CHEMISTRY-JAPAN, 2020, 19 (04) : 169 - 171
  • [10] Reliable and Fast Estimation of Recombination Rates by Convergence Diagnosis and Parallel Markov Chain Monte Carlo
    Guo, Jing
    Jain, Ritika
    Yang, Peng
    Fan, Rui
    Kwoh, Chee Keong
    Zheng, Jie
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (01) : 63 - 72