Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations

被引:42
|
作者
Bansal, Vikas [1 ,2 ]
Libiger, Ondrej [2 ]
机构
[1] Univ Calif San Diego, Dept Pediat, La Jolla, CA 92093 USA
[2] Scripps Translat Sci Inst, La Jolla, CA 92037 USA
来源
BMC BIOINFORMATICS | 2015年 / 16卷
关键词
Admixture estimation; High-throughput sequencing; Allele frequencies; Maximum likelihood; Ancestry; BFGS algorithm; LOCAL-ANCESTRY; GENETIC-STRUCTURE; RARE VARIANTS; ADMIXTURE; STRATIFICATION; ALGORITHM; ASSOCIATION; DESIGN; IMPACT; COMMON;
D O I
10.1186/s12859-014-0418-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads. Results: We describe a fast method for estimating the relative contribution of known reference populations to an individual's genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling. Conclusions: Our method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix.
引用
收藏
页数:11
相关论文
共 40 条
  • [21] Phylogeny of the characeae (class charophyceae) based on DNA sequence data from multiple plastid genes
    Karol, Kennet G.
    Sanders, Erin R.
    Kasper, Alan
    McCourt, Richard M.
    [J]. PHYCOLOGIA, 1997, 36 (04) : 46 - 46
  • [22] Genetic variation among European populations of Bombus pascuorum (Hymenoptera: Apidae) from mitochondrial DNA sequence data
    Pirounakis, K
    Koulianos, S
    Schmid-Hempel, P
    [J]. EUROPEAN JOURNAL OF ENTOMOLOGY, 1998, 95 (01) : 27 - 33
  • [23] A NEW SPECIES OF SHRIKE (LANIIDAE, LANIARIUS) FROM SOMALIA, VERIFIED BY DNA-SEQUENCE DATA FROM THE ONLY KNOWN INDIVIDUAL
    SMITH, EFG
    ARCTANDER, P
    FJELDSA, J
    AMIR, OG
    [J]. IBIS, 1991, 133 (03) : 227 - 235
  • [24] What’s in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
    Lynsey K. Whitacre
    Polyana C. Tizioto
    JaeWoo Kim
    Tad S. Sonstegard
    Steven G. Schroeder
    Leeson J. Alexander
    Juan F. Medrano
    Robert D. Schnabel
    Jeremy F. Taylor
    Jared E. Decker
    [J]. BMC Genomics, 16
  • [25] What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual
    Whitacre, Lynsey K.
    Tizioto, Polyana C.
    Kim, JaeWoo
    Sonstegard, Tad S.
    Schroeder, Steven G.
    Alexander, Leeson J.
    Medrano, Juan F.
    Schnabel, Robert D.
    Taylor, Jeremy F.
    Decker, Jared E.
    [J]. BMC GENOMICS, 2015, 16
  • [26] Using multiple relaxed-clock models to estimate evolutionary timescales from DNA sequence data
    Duchene, Sebastian
    Ho, Simon Y. W.
    [J]. MOLECULAR PHYLOGENETICS AND EVOLUTION, 2014, 77 : 65 - 70
  • [27] HLA-A, -B,-DRB1 allele frequencies and haplotypic association from DNA typing data of 7096 Korean cord blood units
    Yoon, J. H.
    Shin, S.
    Park, M. H.
    Song, E. Y.
    Roh, E. Y.
    [J]. TISSUE ANTIGENS, 2010, 75 (02): : 170 - 173
  • [28] Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
    Xinping Fan
    Guanghao Luo
    Yu S. Huang
    [J]. BMC Bioinformatics, 22
  • [29] Accucopy: accurate and fast inference of allele-specific copy number alterations from low-coverage low-purity tumor sequencing data
    Fan, Xinping
    Luo, Guanghao
    Huang, Yu S.
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [30] Sansevieria (Asparagaceae, Nolinoideae) is a herbaceous Glade within Dracaena: inference from non-coding plastid and nuclear DNA sequence data
    Takawira-Nyenya, Ratidzayi
    Mucina, Ladislav
    Cardinal-Mcteague, Warren M.
    Thiele, Kevin R.
    [J]. PHYTOTAXA, 2018, 376 (06) : 254 - 276