Network-based Clustering and Embedding for High-Dimensional Data Visualization

被引：1

作者：

Zhang, Hengyuan ^{[1
]}

Chen, Xiaowu ^{[1
]}

机构：

[1] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Virtual Real Technol & Syst, Beijing 100191, Peoples R China

来源：

2013 INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS (CAD/GRAPHICS) | 2013年

关键词：

High-dimensional data; Visualization; Network; Fuzzy clustering; Embedding; COMMUNITY STRUCTURE; TOOL;

D O I：

10.1109/CADGraphics.2013.45

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

We present a novel method to visualize highdimensional dataset as a landscape. The goal is to provide clear and compact representation to reveal the structure of high-dimensional datasets in a way that the size and distinctiveness of clusters can be easily discerned, and the relationships among single points can be preserved. Our method is network-based, and consists of two main steps: clustering and embedding. First of all, the similarity graph of high-dimensional dataset is constructed based on the Euclidean distances between data points. For clustering, we propose a new network community detection algorithm to calculate the membership-degree of each vertex belonging to each community. For embedding, we bring forward a practical algorithm to obtain an evenly distributed and regularly shaped layout of data points, in a way that the original relationships among single points are preserved. Finally, the landscape-like visualization is produced by assigning altitudes to data points according to their membership-degrees and by inserting control points. In our high-dimensional data visualization, clusters form highlands, and border data points among clusters show up as valleys. The area and altitude of highland indicate the size and distinctiveness of data cluster respectively.

引用

页码：290 / 297

页数：8

共 50 条

[41] Visualization of high-dimensional biomedical image data
Serocka, Peter
[J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, 2007, 4810 : 475 - 482
[42] Visualization of High-Dimensional Combinatorial Catalysis Data
Suh, Changwon
Sieg, Simone C.
Heying, Matthew J.
Oliver, James H.
Maier, Wilhelm F.
Rajan, Krishna
[J]. JOURNAL OF COMBINATORIAL CHEMISTRY, 2009, 11 (03): : 385 - 392
[43] ShapeVis: High-dimensional Data Visualization at Scale
Kumari, Nupur
Siddarth, R.
Rupela, Akash
Gupta, Piyush
Krishnamurthy, Balaji
[J]. WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2920 - 2926
[44] Orca: A visualization toolkit for high-dimensional data
Sutherland, P
Rossini, A
Lumley, T
Lewin-Koh, N
Dickerson, J
Cox, Z
Cook, D
[J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2000, 9 (03) : 509 - 529
[45] Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data
Sun, Hokeun
Wang, Shuang
[J]. STATISTICS IN MEDICINE, 2013, 32 (12) : 2127 - 2139
[46] Graph convolutional network-based feature selection for high-dimensional and low-sample size data
Chen, Can
Weiss, Scott T.
Liu, Yang-Yu
[J]. BIOINFORMATICS, 2023, 39 (04)
[47] An effective clustering scheme for high-dimensional data
He, Xuansen
He, Fan
Fan, Yueping
Jiang, Lingmin
Liu, Runzong
Maalla, Allam
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45001 - 45045
[48] Approximated clustering of distributed high-dimensional data
Kriegel, HP
Kunath, P
Pfeifle, M
Renz, M
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 432 - 441
[49] Clustering High-Dimensional Noisy Categorical Data
Tian, Zhiyi
Xu, Jiaming
Tang, Jen
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
[50] Subspace selection for clustering high-dimensional data
Baumgartner, C
Plant, C
Kailing, K
Kriegel, HP
Kröger, P
[J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18

← 1 2 3 4 5 →