FAST INFERENCE OF INDIVIDUAL ADMIXTURE COEFFICIENTS USING GEOGRAPHIC DATA

被引:47
|
作者
Caye, Kevin [1 ]
Jay, Flora [2 ]
Michel, Olivier [3 ]
Francois, Olivier [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, TIMC, IMAG,UMR 5525, F-38042 Grenoble, France
[2] Univ Paris Sud, Univ Paris Saclay, CNRS, Lab Rech Informat,UMR 7206,UMR 8623, F-91400 Orsay, France
[3] Univ Grenoble Alpes, CNRS, GIPSA Lab, UMR 5216, F-38042 Grenoble, France
来源
ANNALS OF APPLIED STATISTICS | 2018年 / 12卷 / 01期
关键词
Ancestry estimation algorithms; genotypic data; geographic data; fast algorithms; SPATIAL POPULATION-STRUCTURE; GENOME SCANS; ANCESTRY; LOCALIZATION; ADAPTATION; COMPONENTS; SAMPLES; MODELS; CHOICE; NUMBER;
D O I
10.1214/17-AOAS1106
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Accurately evaluating the distribution of genetic ancestry across geographic space is one of the main questions addressed by evolutionary biologists. This question has been commonly addressed through the application of Bayesian estimation programs allowing their users to estimate individual admixture proportions and allele frequencies among putative ancestral populations. Following the explosion of high-throughput sequencing technologies, several algorithms have been proposed to cope with computational burden generated by the massive data in those studies. In this context, incorporating geographic proximity in ancestry estimation algorithms is an open statistical and computational challenge. In this study, we introduce new algorithms that use geographic information to estimate ancestry proportions and ancestral genotype frequencies from population genetic data. Our algorithms combine matrix factorization methods and spatial statistics to provide estimates of ancestry matrices based on least-squares approximation. We demonstrate the benefit of using spatial algorithms through extensive computer simulations, and we provide an example of application of our new algorithms to a set of spatially referenced samples for the plant species Arabidopsis thaliana. Without loss of statistical accuracy, the new algorithms exhibit runtimes that are much shorter than those observed for previously developed spatial methods. Our algorithms are implemented in the R package, tess3r.
引用
收藏
页码:586 / 608
页数:23
相关论文
共 50 条
  • [31] eSMC: a statistical model to infer admixture events from individual genomics data
    Wang, Yonghui
    Zhao, Zicheng
    Miao, Xinyao
    Wang, Yinan
    Qian, Xiaobo
    Chen, Lingxi
    Wang, Changfa
    Li, Shuaicheng
    BMC GENOMICS, 2022, 23 (SUPPL 4)
  • [32] eSMC: a statistical model to infer admixture events from individual genomics data
    Yonghui Wang
    Zicheng Zhao
    Xinyao Miao
    Yinan Wang
    Xiaobo Qian
    Lingxi Chen
    Changfa Wang
    Shuaicheng Li
    BMC Genomics, 23
  • [33] Activity coefficients of individual ions from titration data
    Jano, I
    Jarvis, T
    JOURNAL OF SOLUTION CHEMISTRY, 2002, 31 (04) : 317 - 339
  • [34] Activity Coefficients of Individual Ions from Titration Data
    Issam Jano
    Tammy Jarvis
    Journal of Solution Chemistry, 2002, 31 : 317 - 339
  • [35] A computationally fast estimator for random coefficients logit demand models using aggregate data
    Lee, Jinhyuk
    Seo, Kyoungwon
    RAND JOURNAL OF ECONOMICS, 2015, 46 (01): : 86 - 102
  • [36] A fast bootstrap algorithm for causal inference with large data
    Kosko, Matthew
    Wang, Lin
    Santacatterina, Michele
    STATISTICS IN MEDICINE, 2024, 43 (15) : 2894 - 2927
  • [37] Fast, fully Bayesian spatiotemporal inference for fMRI data
    Musgrove, Donald R.
    Hughes, John
    Eberly, Lynn E.
    BIOSTATISTICS, 2016, 17 (02) : 291 - 303
  • [38] Primer: Fast Private Transformer Inference on Encrypted Data
    Zheng, Mengxin
    Lou, Qian
    Jiang, Lei
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [39] Gaussianization for fast and accurate inference from cosmological data
    Schuhmann, Robert L.
    Joachimi, Benjamin
    Peiris, Hiranya V.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2016, 459 (02) : 1916 - 1928
  • [40] Novel linkage of individual and geographic data to study firearm violence
    Branas, Charles C.
    Culhane, Dennis
    Richmond, Therese S.
    Wiebe, Douglas J.
    HOMICIDE STUDIES, 2008, 12 (03) : 298 - 320