Probabilistic models of genetic variation in structured populations applied to global human studies

被引:21
|
作者
Hao, Wei [1 ]
Song, Minsun [1 ]
Storey, John D. [1 ,2 ]
机构
[1] Princeton Univ, Lewis Sigler Inst Integrat Genom, Princeton, NJ 08544 USA
[2] Princeton Univ, Ctr Stat & Machine Learning, Princeton, NJ 08544 USA
关键词
TRANSCRIPTION FACTOR; SYNTHETIC MAPS; EXPRESSION; CANDIDATE; INFERENCE; CANCER; FOXP1;
D O I
10.1093/bioinformatics/btv641
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Modern population genetics studies typically involve genome-wide genotyping of individuals from a diverse network of ancestries. An important problem is how to formulate and estimate probabilistic models of observed genotypes that account for complex population structure. The most prominent work on this problem has focused on estimating a model of admixture proportions of ancestral populations for each individual. Here, we instead focus on modeling variation of the genotypes without requiring a higher-level admixture interpretation. Results: We formulate two general probabilistic models, and we propose computationally efficient algorithms to estimate them. First, we show how principal component analysis can be utilized to estimate a general model that includes the well-known Pritchard-Stephens-Donnelly admixture model as a special case. Noting some drawbacks of this approach, we introduce a new 'logistic factor analysis' framework that seeks to directly model the logit transformation of probabilities underlying observed genotypes in terms of latent variables that capture population structure. We demonstrate these advances on data from the Human Genome Diversity Panel and 1000 Genomes Project, where we are able to identify SNPs that are highly differentiated with respect to structure while making minimal modeling assumptions.
引用
收藏
页码:713 / 721
页数:9
相关论文
共 50 条
  • [1] Scaling probabilistic models of genetic variation to millions of humans
    Prem Gopalan
    Wei Hao
    David M Blei
    John D Storey
    Nature Genetics, 2016, 48 : 1587 - 1590
  • [2] Scaling probabilistic models of genetic variation to millions of humans
    Gopalan, Prem
    Hao, Wei
    Blei, David M.
    Storey, John D.
    NATURE GENETICS, 2016, 48 (12) : 1587 - 1590
  • [3] Matching strategies for genetic association studies in structured populations
    Hinds, DA
    Stokowski, RP
    Patil, N
    Konvicka, K
    Kershenobich, D
    Cox, DR
    Ballinger, DG
    AMERICAN JOURNAL OF HUMAN GENETICS, 2004, 74 (02) : 317 - 325
  • [4] A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations
    Bhatnagar, Sahir R.
    Oualkacha, Karim
    Yang, Yi
    Greenwood, Celia M. T.
    GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 695 - 696
  • [5] Probabilistic graphical models for genetic association studies
    Mourad, Raphael
    Sinoquet, Christine
    Leray, Philippe
    BRIEFINGS IN BIOINFORMATICS, 2012, 13 (01) : 20 - 33
  • [6] A global reference for human genetic variation
    Nature, 2015, 526 : 68 - 74
  • [7] A global reference for human genetic variation
    Altshuler, David M.
    Durbin, Richard M.
    Abecasis, Goncalo R.
    Bentley, David R.
    Chakravarti, Aravinda
    Clark, Andrew G.
    Donnelly, Peter
    Eichler, Evan E.
    Flicek, Paul
    Gabriel, Stacey B.
    Gibbs, Richard A.
    Green, Eric D.
    Hurles, Matthew E.
    Knoppers, Bartha M.
    Korbel, Jan O.
    Lander, Eric S.
    Lee, Charles
    Lehrach, Hans
    Mardis, Elaine R.
    Marth, Gabor T.
    McVean, Gil A.
    Nickerson, Deborah A.
    Wang, Jun
    Wilson, Richard K.
    Boerwinkle, Eric
    Doddapaneni, Harsha
    Han, Yi
    Korchina, Viktoriya
    Kovar, Christie
    Lee, Sandra
    Muzny, Donna
    Reid, Jeffrey G.
    Zhu, Yiming
    Chang, Yuqi
    Feng, Qiang
    Fang, Xiaodong
    Guo, Xiaosen
    Jian, Min
    Jiang, Hui
    Jin, Xin
    Lan, Tianming
    Li, Guoqing
    Li, Jingxiang
    Li, Yingrui
    Liu, Shengmao
    Liu, Xiao
    Lu, Yao
    Ma, Xuedi
    Tang, Meifang
    Wang, Bo
    NATURE, 2015, 526 (7571) : 68 - +
  • [8] The distribution of deleterious genetic variation in human populations
    Lohmueller, Kirk E.
    CURRENT OPINION IN GENETICS & DEVELOPMENT, 2014, 29 : 139 - 146
  • [9] Probabilistic models for neural populations that naturally capture global coupling and criticality
    Humplik, Jan
    Tkacik, Gasper
    PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (09)
  • [10] HUMAN POPULATIONS, GENETIC VARIATION, AND EVOLUTION - MORRIS,LN
    HALL, RL
    AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY, 1972, 37 (03) : 419 - 419