Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations

被引：42

作者：

Bansal, Vikas ^{[1
,2
]}

Libiger, Ondrej ^{[2
]}

机构：

[1] Univ Calif San Diego, Dept Pediat, La Jolla, CA 92093 USA

[2] Scripps Translat Sci Inst, La Jolla, CA 92037 USA

来源：

BMC BIOINFORMATICS | 2015年 / 16卷

关键词：

Admixture estimation; High-throughput sequencing; Allele frequencies; Maximum likelihood; Ancestry; BFGS algorithm; LOCAL-ANCESTRY; GENETIC-STRUCTURE; RARE VARIANTS; ADMIXTURE; STRATIFICATION; ALGORITHM; ASSOCIATION; DESIGN; IMPACT; COMMON;

D O I：

10.1186/s12859-014-0418-7

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. Results: We describe a fast method for estimating the relative contribution of known reference populations to an individual's genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. Conclusions: Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix.

引用

页数：11

共 40 条

[1] Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations
Vikas Bansal
Ondrej Libiger
BMC Bioinformatics, 16
[2] Estimating species trees using multiple-allele DNA sequence data
Liu, Liang
Pearl, Dennis K.
Brumfield, Robb T.
Edwards, Scott V.
EVOLUTION, 2008, 62 (08) : 2080 - 2091
[3] Estimating Selection Coefficients in Spatially Structured Populations from Time Series Data of Allele Frequencies
Mathieson, Iain
McVean, Gil
GENETICS, 2013, 193 (03) : 973 - +
[4] Recent common ancestry of human Y chromosomes: Evidence from DNA sequence data
Thomson, R
Pritchard, JK
Shen, PD
Oefner, PJ
Feldman, MW
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (13) : 7360 - 7365
[5] Genotype-free estimation of allele frequencies reduces bias and improves demographic inference from RADSeq data
Warmuth, Vera M.
Ellegren, Hans
MOLECULAR ECOLOGY RESOURCES, 2019, 19 (03) : 586 - 596
[6] Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data
Yang, Wen-Yun
Hormozdiari, Farhad
Wang, Zhanyong
He, Dan
Pasaniuc, Bogdan
Eskin, Eleazar
BIOINFORMATICS, 2013, 29 (18) : 2245 - 2252
[7] Rapid identification of single nucleotide polymorphisms and estimation of allele frequencies using sequence traces from DNA pools
Ye, X.
McLeod, S.
Elfick, D.
Dekkers, J. C. M.
Lamont, S. J.
POULTRY SCIENCE, 2006, 85 (07) : 1165 - 1168
[8] Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data
Pique-Regi, Roger
Degner, Jacob F.
Pai, Athma A.
Gaffney, Daniel J.
Gilad, Yoav
Pritchard, Jonathan K.
GENOME RESEARCH, 2011, 21 (03) : 447 - 455
[9] Building phylogenetic trees from DNA sequence data: Investigating polar bear & giant panda ancestry
Maier, CA
AMERICAN BIOLOGY TEACHER, 2001, 63 (09): : 642 - 646
[10] GADMA: Genetic algorithm for inferring demographic history of multiple populations from allele frequency spectrum data
Noskova, Ekaterina
Ulyantsev, Vladimir
Koepfli, Klaus-Peter
O'Brien, Stephen J.
Dobrynin, Pavel
GIGASCIENCE, 2020, 9 (03):

← 1 2 3 4 →