Network-based Clustering and Embedding for High-Dimensional Data Visualization

被引:1
|
作者
Zhang, Hengyuan [1 ]
Chen, Xiaowu [1 ]
机构
[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China
关键词
High-dimensional data; Visualization; Network; Fuzzy clustering; Embedding; COMMUNITY STRUCTURE; TOOL;
D O I
10.1109/CADGraphics.2013.45
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a novel method to visualize highdimensional dataset as a landscape. The goal is to provide clear and compact representation to reveal the structure of high-dimensional datasets in a way that the size and distinctiveness of clusters can be easily discerned, and the relationships among single points can be preserved. Our method is network-based, and consists of two main steps: clustering and embedding. First of all, the similarity graph of high-dimensional dataset is constructed based on the Euclidean distances between data points. For clustering, we propose a new network community detection algorithm to calculate the membership-degree of each vertex belonging to each community. For embedding, we bring forward a practical algorithm to obtain an evenly distributed and regularly shaped layout of data points, in a way that the original relationships among single points are preserved. Finally, the landscape-like visualization is produced by assigning altitudes to data points according to their membership-degrees and by inserting control points. In our high-dimensional data visualization, clusters form highlands, and border data points among clusters show up as valleys. The area and altitude of highland indicate the size and distinctiveness of data cluster respectively.
引用
收藏
页码:290 / 297
页数:8
相关论文
共 50 条
  • [41] Visualization of high-dimensional biomedical image data
    Serocka, Peter
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, 2007, 4810 : 475 - 482
  • [42] Visualization of High-Dimensional Combinatorial Catalysis Data
    Suh, Changwon
    Sieg, Simone C.
    Heying, Matthew J.
    Oliver, James H.
    Maier, Wilhelm F.
    Rajan, Krishna
    [J]. JOURNAL OF COMBINATORIAL CHEMISTRY, 2009, 11 (03): : 385 - 392
  • [43] ShapeVis: High-dimensional Data Visualization at Scale
    Kumari, Nupur
    Siddarth, R.
    Rupela, Akash
    Gupta, Piyush
    Krishnamurthy, Balaji
    [J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2920 - 2926
  • [44] Orca: A visualization toolkit for high-dimensional data
    Sutherland, P
    Rossini, A
    Lumley, T
    Lewin-Koh, N
    Dickerson, J
    Cox, Z
    Cook, D
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2000, 9 (03) : 509 - 529
  • [45] Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data
    Sun, Hokeun
    Wang, Shuang
    [J]. STATISTICS IN MEDICINE, 2013, 32 (12) : 2127 - 2139
  • [46] Graph convolutional network-based feature selection for high-dimensional and low-sample size data
    Chen, Can
    Weiss, Scott T.
    Liu, Yang-Yu
    [J]. BIOINFORMATICS, 2023, 39 (04)
  • [47] An effective clustering scheme for high-dimensional data
    He, Xuansen
    He, Fan
    Fan, Yueping
    Jiang, Lingmin
    Liu, Runzong
    Maalla, Allam
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45001 - 45045
  • [48] Approximated clustering of distributed high-dimensional data
    Kriegel, HP
    Kunath, P
    Pfeifle, M
    Renz, M
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 432 - 441
  • [49] Clustering High-Dimensional Noisy Categorical Data
    Tian, Zhiyi
    Xu, Jiaming
    Tang, Jen
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [50] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18