FAST INFERENCE OF INDIVIDUAL ADMIXTURE COEFFICIENTS USING GEOGRAPHIC DATA

被引:47
|
作者
Caye, Kevin [1 ]
Jay, Flora [2 ]
Michel, Olivier [3 ]
Francois, Olivier [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, TIMC, IMAG,UMR 5525, F-38042 Grenoble, France
[2] Univ Paris Sud, Univ Paris Saclay, CNRS, Lab Rech Informat,UMR 7206,UMR 8623, F-91400 Orsay, France
[3] Univ Grenoble Alpes, CNRS, GIPSA Lab, UMR 5216, F-38042 Grenoble, France
来源
ANNALS OF APPLIED STATISTICS | 2018年 / 12卷 / 01期
关键词
Ancestry estimation algorithms; genotypic data; geographic data; fast algorithms; SPATIAL POPULATION-STRUCTURE; GENOME SCANS; ANCESTRY; LOCALIZATION; ADAPTATION; COMPONENTS; SAMPLES; MODELS; CHOICE; NUMBER;
D O I
10.1214/17-AOAS1106
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Accurately evaluating the distribution of genetic ancestry across geographic space is one of the main questions addressed by evolutionary biologists. This question has been commonly addressed through the application of Bayesian estimation programs allowing their users to estimate individual admixture proportions and allele frequencies among putative ancestral populations. Following the explosion of high-throughput sequencing technologies, several algorithms have been proposed to cope with computational burden generated by the massive data in those studies. In this context, incorporating geographic proximity in ancestry estimation algorithms is an open statistical and computational challenge. In this study, we introduce new algorithms that use geographic information to estimate ancestry proportions and ancestral genotype frequencies from population genetic data. Our algorithms combine matrix factorization methods and spatial statistics to provide estimates of ancestry matrices based on least-squares approximation. We demonstrate the benefit of using spatial algorithms through extensive computer simulations, and we provide an example of application of our new algorithms to a set of spatially referenced samples for the plant species Arabidopsis thaliana. Without loss of statistical accuracy, the new algorithms exhibit runtimes that are much shorter than those observed for previously developed spatial methods. Our algorithms are implemented in the R package, tess3r.
引用
收藏
页码:586 / 608
页数:23
相关论文
共 50 条
  • [1] Inference of recent admixture using genotype data
    Pfaffelhuber, Peter
    Sester-Huss, Elisabeth
    Baumdicker, Franz
    Naue, Jana
    Lutz-Bonengel, Sabine
    Staubach, Fabian
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2022, 56
  • [2] Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs
    Jinliang Wang
    Heredity, 2022, 129 : 79 - 92
  • [3] Fast and accurate population admixture inference from genotype data from a few microsatellites to millions of SNPs
    Wang, Jinliang
    HEREDITY, 2022, 129 (02) : 79 - 92
  • [4] Fast inference using FPGAs for DUNE data reconstruction
    Rodriguez, Manuel J.
    24TH INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP 2019), 2020, 245
  • [5] A novel tool for individual haplotype inference using mixed data
    Lin, Chen-Pang
    Fann, Cathy S. J.
    JOURNAL OF BIOMEDICAL SCIENCE, 2009, 16
  • [6] A novel tool for individual haplotype inference using mixed data
    Chen-Pang Lin
    Cathy SJ Fann
    Journal of Biomedical Science, 16
  • [7] Improving ecological inference using individual-level data
    Jackson, C
    Best, N
    Richardson, S
    STATISTICS IN MEDICINE, 2006, 25 (12) : 2136 - 2159
  • [8] Fast and Efficient Estimation of Individual Ancestry Coefficients
    Frichot, Eric
    Mathieu, Francois
    Trouillon, Theo
    Bouchard, Guillaume
    Francois, Olivier
    GENETICS, 2014, 196 (04) : 973 - +
  • [9] Modeling Individual Differences in Driver Workload Inference Using Physiological Data
    Yuna Noh
    Seyun Kim
    Young Jae Jang
    Yoonjin Yoon
    International Journal of Automotive Technology, 2021, 22 : 201 - 212
  • [10] Modeling Individual Differences in Driver Workload Inference Using Physiological Data
    Noh, Yuna
    Kim, Seyun
    Jang, Young Jae
    Yoon, Yoonjin
    INTERNATIONAL JOURNAL OF AUTOMOTIVE TECHNOLOGY, 2021, 22 (01) : 201 - 212