FAST INFERENCE OF INDIVIDUAL ADMIXTURE COEFFICIENTS USING GEOGRAPHIC DATA

被引:47
|
作者
Caye, Kevin [1 ]
Jay, Flora [2 ]
Michel, Olivier [3 ]
Francois, Olivier [1 ]
机构
[1] Univ Grenoble Alpes, CNRS, TIMC, IMAG,UMR 5525, F-38042 Grenoble, France
[2] Univ Paris Sud, Univ Paris Saclay, CNRS, Lab Rech Informat,UMR 7206,UMR 8623, F-91400 Orsay, France
[3] Univ Grenoble Alpes, CNRS, GIPSA Lab, UMR 5216, F-38042 Grenoble, France
来源
ANNALS OF APPLIED STATISTICS | 2018年 / 12卷 / 01期
关键词
Ancestry estimation algorithms; genotypic data; geographic data; fast algorithms; SPATIAL POPULATION-STRUCTURE; GENOME SCANS; ANCESTRY; LOCALIZATION; ADAPTATION; COMPONENTS; SAMPLES; MODELS; CHOICE; NUMBER;
D O I
10.1214/17-AOAS1106
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Accurately evaluating the distribution of genetic ancestry across geographic space is one of the main questions addressed by evolutionary biologists. This question has been commonly addressed through the application of Bayesian estimation programs allowing their users to estimate individual admixture proportions and allele frequencies among putative ancestral populations. Following the explosion of high-throughput sequencing technologies, several algorithms have been proposed to cope with computational burden generated by the massive data in those studies. In this context, incorporating geographic proximity in ancestry estimation algorithms is an open statistical and computational challenge. In this study, we introduce new algorithms that use geographic information to estimate ancestry proportions and ancestral genotype frequencies from population genetic data. Our algorithms combine matrix factorization methods and spatial statistics to provide estimates of ancestry matrices based on least-squares approximation. We demonstrate the benefit of using spatial algorithms through extensive computer simulations, and we provide an example of application of our new algorithms to a set of spatially referenced samples for the plant species Arabidopsis thaliana. Without loss of statistical accuracy, the new algorithms exhibit runtimes that are much shorter than those observed for previously developed spatial methods. Our algorithms are implemented in the R package, tess3r.
引用
收藏
页码:586 / 608
页数:23
相关论文
共 50 条
  • [41] For an observatory of geographic data on the Web Experimentation using French geographic data infrastructures
    Noucher, Matthieu
    Gourmelon, Francoise
    Claramunt, Christophe
    REVUE INTERNATIONALE DE GEOMATIQUE, 2019, 29 (01): : 9 - 30
  • [42] Inference of human geographic origins using Alu insertion polymorphisms
    Ray, DA
    Walker, JA
    Hall, A
    Llewellyn, B
    Ballantyne, J
    Christian, AT
    Turteltaub, K
    Batzer, MA
    FORENSIC SCIENCE INTERNATIONAL, 2005, 153 (2-3) : 117 - 124
  • [43] Spatial inference of herbicide bioavailability using a geographic information system
    Williams, MM
    Mortensen, DA
    Waltman, WJ
    Martin, AR
    WEED TECHNOLOGY, 2002, 16 (03) : 603 - 611
  • [44] Geographic Data Verification Automatic Verification of Physical Geographic Data Using Maps
    Mehmood, Rizwan
    IPSI BGD TRANSACTIONS ON INTERNET RESEARCH, 2014, 10 (02): : 20 - 25
  • [45] Patching Traceroute Using Geographic Information In Neutrality Inference Crowdsourcing
    Tian, Feng
    Wang, Lei
    Liu, Xinyang
    Gao, Shan
    2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 1 - 9
  • [46] Protecting individual information against inference attacks in data publishing
    Li, Chen
    Shirani-Mehr, Houtan
    Yang, Xiaochun
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 422 - +
  • [47] INFERENCE OF INDIVIDUAL PROPERTIES FROM AGGREGATE DATA - COLINEARITY PROBLEMS
    ROSENTHAL, H
    KIES, NE
    REVUE FRANCAISE DE SOCIOLOGIE, 1970, 11 (01): : 65 - 73
  • [48] Inference Attacks and Controls on Genotypes and Phenotypes for Individual Genomic Data
    He, Zaobo
    Yu, Jiguo
    Li, Ji
    Han, Qilong
    Luo, Guangchun
    Li, Yingshu
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (03) : 930 - 937
  • [49] ABC inference of multi-population divergence with admixture from unphased population genomic data
    Robinson, John D.
    Bunnefeld, Lynsey
    Hearn, Jack
    Stone, Graham N.
    Hickerson, Michael J.
    MOLECULAR ECOLOGY, 2014, 23 (18) : 4458 - 4471
  • [50] Bootstrap Statistical Inference about the Regression Coefficients Based on Fuzzy Data
    Akbari, M. G.
    Mohammadalizadeh, R.
    Rezaei, M.
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2012, 14 (04) : 549 - 556