Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

被引:0
|
作者
Cai, T. Tony [1 ]
Ma, Rong [2 ]
机构
[1] Department of Statistics and Data Science, University of Pennsylvania, Philadelphia,PA,19104, United States
[2] Department of Statistics, Stanford University, Stanford,CA,94305, United States
关键词
Clustered datum - Clusterings - Dimension reduction - Foundation of data science - Low dimensional - Nonlinear dimension - Nonlinear dimension reduction - Stochastic neighbor embedding - T-distributed stochastic neighbor embedding - Theoretical foundations;
D O I
暂无
中图分类号
学科分类号
摘要
This paper investigates the theoretical foundations of the t-distributed stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based on the gradient descent approach is presented. For the early exaggeration stage of t-SNE, we show its asymptotic equivalence to power iterations based on the underlying graph Laplacian, characterize its limiting behavior, and uncover its deep connection to Laplacian spectral clustering, and fundamental principles including early stopping as implicit regularization. The results explain the intrinsic mechanism and the empirical benefits of such a computational strategy. For the embedding stage of t-SNE, we characterize the kinematics of the low-dimensional map throughout the iterations, and identify an amplification phase, featuring the intercluster repulsion and the expansive behavior of the low-dimensional map, and a stabilization phase. The general theory explains the fast convergence rate and the exceptional empirical performance of t-SNE for visualizing clustered data, brings forth the interpretations of the t-SNE visualizations, and provides theoretical guidance for applying t-SNE and selecting its tuning parameters in various applications. ©2022 Tony Cai and Rong Ma.
引用
收藏
相关论文
共 50 条
  • [1] Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data
    Cai, T. Tony
    Ma, Rong
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [2] Visualizing Data using t-SNE
    van der Maaten, Laurens
    Hinton, Geoffrey
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2008, 9 : 2579 - 2605
  • [3] t-SNE for Complex Multi-Manifold High-Dimensional Data
    Bian, Rongzheng
    Zhang, Jian
    Zhou, Liang
    Jiang, Peng
    Chen, Baoquan
    Wang, Yunhai
    [J]. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (11): : 1746 - 1754
  • [4] Polygonal Coordinate System: Visualizing high-dimensional data using geometric DR, and a deterministic version of t-SNE
    Flexa, Caio
    Gomes, Walisson
    Moreira, Igor
    Alves, Ronnie
    Sales, Claudomiro
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 175
  • [5] Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE
    Husnain, Mujtaba
    Missen, Malik Muhammad Saad
    Mumtaz, Shahzad
    Luqman, Muhammad Muzzamil
    Coustaty, Mickael
    Ogier, Jean-Marc
    [J]. SYMMETRY-BASEL, 2019, 11 (01):
  • [6] Visualizing Clustered Botnet Traffic using t-SNE on Aggregated NetFlows
    Alsadi, Muayyad Saleh
    Hadi, Ali Hussein
    [J]. 2017 INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), 2017, : 179 - 184
  • [7] Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets
    Wang, Yinqiao
    Chen, Lu
    Jo, Jaemin
    Wang, Yunhai
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (01) : 623 - 632
  • [8] Laplacian-based Cluster-Contractive t-SNE for High-Dimensional Data Visualization
    Sun, Yan
    Han, Yi
    Fan, Jicong
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (01)
  • [9] An automatic affinity propagation clustering based on improved equilibrium optimizer and t-SNE for high-dimensional data
    Duan, Yuxian
    Liu, Changyun
    Li, Song
    Guo, Xiangke
    Yang, Chunlin
    [J]. INFORMATION SCIENCES, 2023, 623 : 434 - 454
  • [10] Visualizing Time Series Data with Temporal Matching Based t-SNE
    Wong, Kwan Yeung
    Chung, Fu-lai
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,