On rare variants in principal component analysis of population stratification

被引:17
|
作者
Ma, Shengqing [1 ]
Shi, Gang [1 ]
机构
[1] Xidian Univ, State Key Lab Integrated Serv Networks, 2 South Taibai Rd, Xian 710071, Shaanxi, Peoples R China
关键词
Rare variant; Population stratification; Principal component analysis; Single nucleotide polymorphism; ASSOCIATION; MODEL; INFERENCE;
D O I
10.1186/s12863-020-0833-x
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Population stratification is a known confounder of genome-wide association studies, as it can lead to false positive results. Principal component analysis (PCA) method is widely applied in the analysis of population structure with common variants. However, it is still unclear about the analysis performance when rare variants are used. Results We derive a mathematical expectation of the genetic relationship matrix. Variance and covariance elements of the expected matrix depend explicitly on allele frequencies of the genetic markers used in the PCA analysis. We show that inter-population variance is solely contained in K principal components (PCs) and mostly in the largest K-1 PCs, where K is the number of populations in the samples. We propose F-PC, ratio of the inter-population variance to the intra-population variance in the K population informative PCs, and d(2), sum of squared distances among populations, as measures of population divergence. We show analytically that when allele frequencies become small, the ratio F-PC abates, the population distance d(2) decreases, and portion of variance explained by the K PCs diminishes. The results are validated in the analysis of the 1000 Genomes Project data. The ratio F-PC is 93.85, population distance d(2) is 444.38, and variance explained by the largest five PCs is 17.09% when using with common variants with allele frequencies between 0.4 and 0.5. However, the ratio, distance and percentage decrease to 1.83, 17.83 and 0.74%, respectively, with rare variants of frequencies between 0.0001 and 0.01. Conclusions The PCA of population stratification performs worse with rare variants than with common ones. It is necessary to restrict the selection to only the common variants when analyzing population stratification with sequencing data.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Population structure analysis using rare and common functional variants
    Tesfaye M Baye
    Hua He
    Lili Ding
    Brad G Kurowski
    Xue Zhang
    Lisa J Martin
    BMC Proceedings, 5 (Suppl 9)
  • [42] Principal component analysis in high resolution electrocardiogram for risk stratification of sustained monomorphic ventricular tachycardia
    Nasario-Junior, Olivasse
    Benchimol-Barbosa, Paulo Roberto
    Nadal, Jurandir
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2014, 10 : 275 - 280
  • [43] Variants of principal components analysis
    Liu, Wei-min
    Chang, Chein-I
    IGARSS: 2007 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-12: SENSING AND UNDERSTANDING OUR PLANET, 2007, : 1083 - 1086
  • [44] Some Multilinear Variants of Principal Component Analysis: Examples in Grayscale Image Recognition and Reconstruction
    Nelson, Richard A.
    Roberts, Rodney G.
    IEEE SYSTEMS MAN AND CYBERNETICS MAGAZINE, 2021, 7 (01): : 25 - 33
  • [45] Anomaly Detection Based on Kernel Principal Component and Principal Component Analysis
    Wang, Wei
    Zhang, Min
    Wang, Dan
    Jiang, Yu
    Li, Yuliang
    Wu, Hongda
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, 2019, 463 : 2222 - 2228
  • [46] Exploration of Principal Component Analysis: Deriving Principal Component Analysis Visually Using Spectra
    Beattie, J. Renwick
    Esmonde-White, Francis W. L.
    APPLIED SPECTROSCOPY, 2021, 75 (04) : 361 - 375
  • [47] Anomaly detection based on kernel principal component and principal component analysis
    Wang, Wei
    Zhang, Min
    Wang, Dan
    Jiang, Yu
    Li, Yuliang
    Wu, Hongda
    Lecture Notes in Electrical Engineering, 2019, 463 : 2222 - 2228
  • [48] Comparison of Several Variants of Principal Component Analysis (PCA) on Forensic Analysis of Paper Based on IR Spectrum
    Lee, Loong-Chuen
    Liong, Choong-Yeun
    Osman, Khairul
    Jemain, Abdul Aziz
    ADVANCES IN INDUSTRIAL AND APPLIED MATHEMATICS, 2016, 1750
  • [49] Euler Principal Component Analysis
    Liwicki, Stephan
    Tzimiropoulos, Georgios
    Zafeiriou, Stefanos
    Pantic, Maja
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 101 (03) : 498 - 518
  • [50] On Bayesian principal component analysis
    Smidl, Vaclav
    Quinn, Anthony
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (09) : 4101 - 4123