Application of t-SNE to human genetic data

被引:130
|
作者
Li, Wentian [1 ]
Cerise, Jane E. [1 ]
Yang, Yaning [2 ]
Han, Henry [3 ]
机构
[1] Feinstein Inst Med Res, Robert S Boas Ctr Genom & Human Genet, Northwell Hlth, Manhasset, NY 11030 USA
[2] Univ Sci & Technol China, Dept Stat & Finance, Hefei, Anhui, Peoples R China
[3] Fordham Univ, Lincoln Ctr, Dept Comp & Informat Sci, New York, NY USA
关键词
t-SNE; PCA; SNP; dimension reduction; PRINCIPAL-COMPONENT ANALYSIS; FAMILY-BASED TESTS; POPULATION-STRUCTURE; ASSOCIATION; ALGORITHM; MODEL; STRATIFICATION; INFERENCE; CORRECTS; RISK;
D O I
10.1142/S0219720017500172
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The t-distributed stochastic neighbor embedding t-SNE is a new dimension reduction and visualization technique for high-dimensional data. t-SNE is rarely applied to human genetic data, even though it is commonly used in other data-intensive biological fields, such as single-cell genomics. We explore the applicability of t-SNE to human genetic data and make these observations: (i) similar to previously used dimension reduction techniques such as principal component analysis (PCA), t-SNE is able to separate samples from different continents; (ii) unlike PCA, t-SNE is more robust with respect to the presence of outliers; (iii) t-SNE is able to display both continental and sub-continental patterns in a single plot. We conclude that the ability for t-SNE to reveal population stratification at different scales could be useful for human genetic association studies.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Visualizing Data using t-SNE
    van der Maaten, Laurens
    Hinton, Geoffrey
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2579 - 2605
  • [2] Conditional t-SNE: more informative t-SNE embeddings
    Kang, Bo
    Garcia Garcia, Dario
    Lijffijt, Jefrey
    Santos-Rodriguez, Raul
    De Bie, Tijl
    [J]. MACHINE LEARNING, 2021, 110 (10) : 2905 - 2940
  • [3] Conditional t-SNE: More informative t-SNE embeddings
    Kang, Bo
    Garcia, Dario Garcia
    Lijffijt, Jefrey
    Santos-Rodriguez, Raul
    De Bie, Tijl
    [J]. 2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [4] Conditional t-SNE: more informative t-SNE embeddings
    Bo Kang
    Darío García García
    Jefrey Lijffijt
    Raúl Santos-Rodríguez
    Tijl De Bie
    [J]. Machine Learning, 2021, 110 : 2905 - 2940
  • [5] Seeing data as t-SNE and UMAP do
    Marx, Vivien
    [J]. NATURE METHODS, 2024, 21 (06) : 930 - 933
  • [6] Stability analysis of the t-SNE algorithm for human activity pattern data
    Hamad, Rebeen Ali
    Jarpe, Eric
    Lundstrom, Jens
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1839 - 1845
  • [7] Wasserstein t-SNE
    Bachmann, Fynn
    Hennig, Philipp
    Kobak, Dmitry
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT I, 2023, 13713 : 104 - 120
  • [8] Optimizing Data Locality and Termination Criterion for t-SNE
    Dikbayir, Doga
    Shanker, Balasubramaniam
    Aktulga, Hasan Metin
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [9] Interactive Supervision with t-SNE
    Luus, Francois
    Khan, Naweed
    Akhalwaya, Ismail
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 85 - 92
  • [10] Graph Layouts by t-SNE
    Kruiger, J. F.
    Rauber, P. E.
    Martins, R. M.
    Kerren, A.
    Kobourov, S.
    Telea, A. C.
    [J]. COMPUTER GRAPHICS FORUM, 2017, 36 (03) : 283 - 294