On rare variants in principal component analysis of population stratification

被引:17
|
作者
Ma, Shengqing [1 ]
Shi, Gang [1 ]
机构
[1] Xidian Univ, State Key Lab Integrated Serv Networks, 2 South Taibai Rd, Xian 710071, Shaanxi, Peoples R China
关键词
Rare variant; Population stratification; Principal component analysis; Single nucleotide polymorphism; ASSOCIATION; MODEL; INFERENCE;
D O I
10.1186/s12863-020-0833-x
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Population stratification is a known confounder of genome-wide association studies, as it can lead to false positive results. Principal component analysis (PCA) method is widely applied in the analysis of population structure with common variants. However, it is still unclear about the analysis performance when rare variants are used. Results We derive a mathematical expectation of the genetic relationship matrix. Variance and covariance elements of the expected matrix depend explicitly on allele frequencies of the genetic markers used in the PCA analysis. We show that inter-population variance is solely contained in K principal components (PCs) and mostly in the largest K-1 PCs, where K is the number of populations in the samples. We propose F-PC, ratio of the inter-population variance to the intra-population variance in the K population informative PCs, and d(2), sum of squared distances among populations, as measures of population divergence. We show analytically that when allele frequencies become small, the ratio F-PC abates, the population distance d(2) decreases, and portion of variance explained by the K PCs diminishes. The results are validated in the analysis of the 1000 Genomes Project data. The ratio F-PC is 93.85, population distance d(2) is 444.38, and variance explained by the largest five PCs is 17.09% when using with common variants with allele frequencies between 0.4 and 0.5. However, the ratio, distance and percentage decrease to 1.83, 17.83 and 0.74%, respectively, with rare variants of frequencies between 0.0001 and 0.01. Conclusions The PCA of population stratification performs worse with rare variants than with common ones. It is necessary to restrict the selection to only the common variants when analyzing population stratification with sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Pathway Analysis of Rare Variants Using generalized structured component analysis
    Park, Taesung
    Choi, Sungkyoung
    Lee, Sungyoung
    Hwang, Heungsun
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2016, 51 : 975 - 975
  • [22] Evaluation of methods for adjusting population stratification in genome-wide association studies: Standard versus categorical principal component analysis
    Turkmen, Asuman S.
    Yuan, Yuan
    Billor, Nedret
    ANNALS OF HUMAN GENETICS, 2019, 83 (06) : 454 - 464
  • [23] Mediation analysis with principal stratification
    Gallop, Robert
    Small, Dylan S.
    Lin, Julia Y.
    Elliott, Michael R.
    Joffe, Marshall
    Ten Have, Thomas R.
    STATISTICS IN MEDICINE, 2009, 28 (07) : 1108 - 1130
  • [24] Principal Component Projection Without Principal Component Analysis
    Frostig, Roy
    Musco, Cameron
    Musco, Christopher
    Sidford, Aaron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [25] A Frame Work for Classifying Physiological Tremor Variants Employing Principal Component Analysis
    Prasad, S. J. Krishna
    Priyanka, D. C.
    Talasila, Viswanath
    2014 INTERNATIONAL CONFERENCE ON CIRCUITS, COMMUNICATION, CONTROL AND COMPUTING (I4C), 2014, : 173 - 176
  • [26] Principal component analysis
    Michael Greenacre
    Patrick J. F. Groenen
    Trevor Hastie
    Alfonso Iodice D’Enza
    Angelos Markos
    Elena Tuzhilina
    Nature Reviews Methods Primers, 2
  • [27] Determining population stratification and subgroup effects in association studies of rare genetic variants for nicotine dependence
    Hsieh, Ai-Ru
    Chen, Li-Shiun
    Li, Ying-Ju
    Fann, Cathy S. J.
    PSYCHIATRIC GENETICS, 2019, 29 (04) : 111 - 119
  • [28] Principal component analysis
    Greenacre, Michael
    Groenen, Patrick J. F.
    Hastie, Trevor
    D'Enza, Alfonso Lodice
    Markos, Angelos
    Tuzhilina, Elena
    NATURE REVIEWS METHODS PRIMERS, 2022, 2 (01):
  • [29] Principal component analysis
    Bro, Rasmus
    Smilde, Age K.
    ANALYTICAL METHODS, 2014, 6 (09) : 2812 - 2831
  • [30] Principal component analysis
    Jake Lever
    Martin Krzywinski
    Naomi Altman
    Nature Methods, 2017, 14 : 641 - 642