iVIBRATE: Interactive visualization-based framework for clustering large datasets

被引:34
|
作者
Chen, Keke [1 ]
Liu, Ling [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
algorithms; design; human factors; reliability;
D O I
10.1145/1148020.1148024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With continued advances in communication network technology and sensing technology, there is astounding growth in the amount of data produced and made available through cyberspace. Efficient and high-quality clustering of large datasets continues to be one of the most important problems in large-scale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of sampling/summarization, iterative cluster analysis, and disk-labeling. There are three known problems with this framework which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling these particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third obstacle is the lack of research about issues related to effectively integrating the three phases. In this article, we describe iVIBRATE-an interactive visualization-based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster-rendering subsystem which invites human interplay into the large-scale iterative clustering process through interactive visualization, and its adaptive ClusterMap labeling subsystem which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of the special issues presented in integrating the two components and the sampling approach into a coherent framework, as well as the solutions for improving the reliability of the framework and for minimizing the amount of errors generated within the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and we experimentally evaluate the iVIBRATE approach using both real-life and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user in the clustering process and generate high-quality clustering results for large datasets.
引用
收藏
页码:245 / 294
页数:50
相关论文
共 50 条
  • [21] GlobeCorr: interactive globe-based visualization for correlation datasets
    Arab, Mariam
    Woods, Nolan
    Garlock, Emma S.
    Winsor, Geoffrey L.
    Parks, Jaclyn P.
    Jia, Baofeng
    Doiron, Dany
    Takaro, Tim K.
    Brook, Jeffrey R.
    Brinkman, Fiona S. L.
    BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [22] An interactive visualization-based approach for high throughput screening information management in drug discovery
    Chan, Tammy Pui Shan
    Malik, Preeti
    Singh, Rahul
    2006 28TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-15, 2006, : 5109 - +
  • [23] Visualization-based disentanglement of latent space
    Runze Huang
    Qianying Zheng
    Haifang Zhou
    Neural Computing and Applications, 2021, 33 : 16213 - 16228
  • [24] Hybrid visualization-based framework for depressive state detection and characterization of atypical patients
    Kopitar, Leon
    Kokol, Peter
    Stiglic, Gregor
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 147
  • [25] Interactive visualization and analysis of large-scale sequencing datasets using ZENBU
    Jessica Severin
    Marina Lizio
    Jayson Harshbarger
    Hideya Kawaji
    Carsten O Daub
    Yoshihide Hayashizaki
    Nicolas Bertin
    Alistair R R Forrest
    Nature Biotechnology, 2014, 32 : 217 - 219
  • [26] Interactive visualization and analysis of large-scale sequencing datasets using ZENBU
    Severin, Jessica
    Lizio, Marina
    Harshbarger, Jayson
    Kawaji, Hideya
    Daub, Carsten O.
    Hayashizaki, Yoshihide
    Bertin, Nicolas
    Forrest, Alistair R. R.
    NATURE BIOTECHNOLOGY, 2014, 32 (03) : 217 - 219
  • [27] Visualization-based information retrieval on the Web
    Koshman, Sherry
    LIBRARY & INFORMATION SCIENCE RESEARCH, 2006, 28 (02) : 192 - 207
  • [28] Visualization-based disentanglement of latent space
    Huang, Runze
    Zheng, Qianying
    Zhou, Haifang
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (23): : 16213 - 16228
  • [29] A visualization-based investigation of dialysis properties
    Xu, L
    Sun, YF
    Li, M
    Yang, JM
    Gao, D
    FRONTIERS ON SEPARATION SCIENCE AND TECHNOLOGY, 2004, : 599 - 604
  • [30] fMLC: fast multi-level clustering and visualization of large molecular datasets
    Vu, D.
    Georgievska, S.
    Szoke, S.
    Kuzniar, A.
    Robert, V.
    BIOINFORMATICS, 2018, 34 (09) : 1577 - 1579