FAST INFERENCE OF INDIVIDUAL ADMIXTURE COEFFICIENTS USING GEOGRAPHIC DATA

被引:47
|
作者
Caye, Kevin [1 ]
Jay, Flora [2 ]
Michel, Olivier [3 ]
Francois, Olivier [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, TIMC, IMAG,UMR 5525, F-38042 Grenoble, France
[2] Univ Paris Sud, Univ Paris Saclay, CNRS, Lab Rech Informat,UMR 7206,UMR 8623, F-91400 Orsay, France
[3] Univ Grenoble Alpes, CNRS, GIPSA Lab, UMR 5216, F-38042 Grenoble, France
来源
ANNALS OF APPLIED STATISTICS | 2018年 / 12卷 / 01期
关键词
Ancestry estimation algorithms; genotypic data; geographic data; fast algorithms; SPATIAL POPULATION-STRUCTURE; GENOME SCANS; ANCESTRY; LOCALIZATION; ADAPTATION; COMPONENTS; SAMPLES; MODELS; CHOICE; NUMBER;
D O I
10.1214/17-AOAS1106
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Accurately evaluating the distribution of genetic ancestry across geographic space is one of the main questions addressed by evolutionary biologists. This question has been commonly addressed through the application of Bayesian estimation programs allowing their users to estimate individual admixture proportions and allele frequencies among putative ancestral populations. Following the explosion of high-throughput sequencing technologies, several algorithms have been proposed to cope with computational burden generated by the massive data in those studies. In this context, incorporating geographic proximity in ancestry estimation algorithms is an open statistical and computational challenge. In this study, we introduce new algorithms that use geographic information to estimate ancestry proportions and ancestral genotype frequencies from population genetic data. Our algorithms combine matrix factorization methods and spatial statistics to provide estimates of ancestry matrices based on least-squares approximation. We demonstrate the benefit of using spatial algorithms through extensive computer simulations, and we provide an example of application of our new algorithms to a set of spatially referenced samples for the plant species Arabidopsis thaliana. Without loss of statistical accuracy, the new algorithms exhibit runtimes that are much shorter than those observed for previously developed spatial methods. Our algorithms are implemented in the R package, tess3r.
引用
收藏
页码:586 / 608
页数:23
相关论文
共 50 条
  • [21] Fast phylogenetic inference from typing data
    João A. Carriço
    Maxime Crochemore
    Alexandre P. Francisco
    Solon P. Pissis
    Bruno Ribeiro-Gonçalves
    Cátia Vaz
    Algorithms for Molecular Biology, 13
  • [22] Fast homomorphic SVM inference on encrypted data
    Al Badawi, Ahmad
    Chen, Ling
    Vig, Saru
    Neural Computing and Applications, 2022, 34 (18) : 15555 - 15573
  • [23] Fast phylogenetic inference from typing data
    Carrico, Joao A.
    Crochemore, Maxime
    Francisco, Alexandre P.
    Pissis, Solon P.
    Ribeiro-Goncalves, Bruno
    Vaz, Catia
    ALGORITHMS FOR MOLECULAR BIOLOGY, 2018, 13
  • [24] Fast approximate inference for multivariate longitudinal data
    Hughes, David M.
    Garcia-Finana, Marta
    Wand, Matt P.
    BIOSTATISTICS, 2022, 24 (01) : 177 - 192
  • [25] Fast admixture analysis and population tree estimation for SNP and NGS data
    Cheng, Jade Yu
    Mailund, Thomas
    Nielsen, Rasmus
    BIOINFORMATICS, 2017, 33 (14) : 2148 - 2155
  • [26] Estimation of coefficients of individual agreement (CIAs) for quantitative and binary data using SAS and R
    Pan, Yi
    Gao, Jingjing
    Haber, Michael
    Barnhart, Huiman X.
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2010, 98 (02) : 214 - 219
  • [27] ADMIXTURE ESTIMATION USING SKIN REFLECTANCE DATA
    LEES, FC
    RELETHFORD, JH
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 1978, 49 (04) : 505 - 509
  • [28] Inference on multiple correlation coefficients with moderately high dimensional data
    Zheng, Shurong
    Jiang, Dandan
    Bai, Zhidong
    He, Xuming
    BIOMETRIKA, 2014, 101 (03) : 748 - 754
  • [29] Inference of population structure and admixture proportion from Y chromosomal data of Chinese population
    Song, Mengyuan
    Wang, Xindi
    Zhao, Chenxi
    Qian, Xiaoqin
    Lang, Min
    Hou, Yiping
    Song, Feng
    ELECTROPHORESIS, 2022, 43 (23-24) : 2351 - 2362
  • [30] Inference of kinship coefficients from Korean SNP genotyping data
    Park, Seong-Jin
    Yang, Jin Ok
    Kim, Sang Cheol
    Kwon, Jekeun
    Lee, Sanghyuk
    Lee, Byungwook
    BMB REPORTS, 2013, 46 (06) : 305 - 309