Network-based Clustering and Embedding for High-Dimensional Data Visualization

被引:1
|
作者
Zhang, Hengyuan [1 ]
Chen, Xiaowu [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
关键词
High-dimensional data; Visualization; Network; Fuzzy clustering; Embedding; COMMUNITY STRUCTURE; TOOL;
D O I
10.1109/CADGraphics.2013.45
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a novel method to visualize highdimensional dataset as a landscape. The goal is to provide clear and compact representation to reveal the structure of high-dimensional datasets in a way that the size and distinctiveness of clusters can be easily discerned, and the relationships among single points can be preserved. Our method is network-based, and consists of two main steps: clustering and embedding. First of all, the similarity graph of high-dimensional dataset is constructed based on the Euclidean distances between data points. For clustering, we propose a new network community detection algorithm to calculate the membership-degree of each vertex belonging to each community. For embedding, we bring forward a practical algorithm to obtain an evenly distributed and regularly shaped layout of data points, in a way that the original relationships among single points are preserved. Finally, the landscape-like visualization is produced by assigning altitudes to data points according to their membership-degrees and by inserting control points. In our high-dimensional data visualization, clusters form highlands, and border data points among clusters show up as valleys. The area and altitude of highland indicate the size and distinctiveness of data cluster respectively.
引用
收藏
页码:290 / 297
页数:8
相关论文
共 50 条
  • [1] Relationship-based clustering and visualization for high-dimensional data mining
    Strehl, A
    Ghosh, J
    [J]. INFORMS JOURNAL ON COMPUTING, 2003, 15 (02) : 208 - 230
  • [2] Network-Based Interface for the Exploration of High-Dimensional Data Spaces
    Zhang, Zhiyuan
    McDonnell, Kevin T.
    Mueller, Klaus
    [J]. IEEE PACIFIC VISUALIZATION SYMPOSIUM 2012, 2012, : 17 - 24
  • [3] High-dimensional data visualization
    Tang, Lin
    [J]. NATURE METHODS, 2020, 17 (02) : 129 - 129
  • [4] Network-based regularization for analysis of high-dimensional genomic data with group structure
    Kim, Kipoong
    Choi, Jiyun
    Sun, Hokeun
    [J]. KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1117 - 1128
  • [5] High-dimensional data visualization
    Lin Tang
    [J]. Nature Methods, 2020, 17 : 129 - 129
  • [6] Doubly supervised embedding based on class labels and intrinsic clusters for high-dimensional data visualization
    Kim, Hannah
    Choo, Jaegul
    Reddy, Chandan K.
    Park, Haesun
    [J]. NEUROCOMPUTING, 2015, 150 : 570 - 582
  • [7] A Network-Based Model for High-Dimensional Information Filtering
    Nanas, Nikolaos
    Vavalis, Manolis
    De Roeck, Anne
    [J]. SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 202 - 209
  • [8] Analysis and Visualization of High-Dimensional Dynamical Systems' Phase Space Using a Network-Based Approach
    St Luce, Shane
    Sayama, Hiroki
    [J]. COMPLEXITY, 2022, 2022
  • [9] Visualization and unsupervised predictive clustering of high-dimensional multimodal neuroimaging data
    Mwangi, Benson
    Soares, Jair C.
    Hasan, Khader M.
    [J]. JOURNAL OF NEUROSCIENCE METHODS, 2014, 236 : 19 - 25
  • [10] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519