iVIBRATE: Interactive visualization-based framework for clustering large datasets

被引:34
|
作者
Chen, Keke [1 ]
Liu, Ling [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
algorithms; design; human factors; reliability;
D O I
10.1145/1148020.1148024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With continued advances in communication network technology and sensing technology, there is astounding growth in the amount of data produced and made available through cyberspace. Efficient and high-quality clustering of large datasets continues to be one of the most important problems in large-scale data analysis. A commonly used methodology for cluster analysis on large datasets is the three-phase framework of sampling/summarization, iterative cluster analysis, and disk-labeling. There are three known problems with this framework which demand effective solutions. The first problem is how to effectively define and validate irregularly shaped clusters, especially in large datasets. Automated algorithms and statistical methods are typically not effective in handling these particular clusters. The second problem is how to effectively label the entire data on disk (disk-labeling) without introducing additional errors, including the solutions for dealing with outliers, irregular clusters, and cluster boundary extension. The third obstacle is the lack of research about issues related to effectively integrating the three phases. In this article, we describe iVIBRATE-an interactive visualization-based three-phase framework for clustering large datasets. The two main components of iVIBRATE are its VISTA visual cluster-rendering subsystem which invites human interplay into the large-scale iterative clustering process through interactive visualization, and its adaptive ClusterMap labeling subsystem which offers visualization-guided disk-labeling solutions that are effective in dealing with outliers, irregular clusters, and cluster boundary extension. Another important contribution of iVIBRATE development is the identification of the special issues presented in integrating the two components and the sampling approach into a coherent framework, as well as the solutions for improving the reliability of the framework and for minimizing the amount of errors generated within the cluster analysis process. We study the effectiveness of the iVIBRATE framework through a walkthrough example dataset of a million records and we experimentally evaluate the iVIBRATE approach using both real-life and synthetic datasets. Our results show that iVIBRATE can efficiently involve the user in the clustering process and generate high-quality clustering results for large datasets.
引用
收藏
页码:245 / 294
页数:50
相关论文
共 50 条
  • [41] Visualization-Based Active Learning for Video Annotation
    Liao, Hongsen
    Chen, Li
    Song, Yibo
    Ming, Hao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (11) : 2196 - 2205
  • [42] Interactivity Factors in Visualization-Based Exploratory Search
    Baigelenov, Ali
    Parsons, Paul
    CHI 2018: EXTENDED ABSTRACTS OF THE 2018 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2018,
  • [43] A framework for interactive visualization of component-based software
    Telea, A
    Voinea, L
    PROCEEDINGS OF THE 30TH EUROMICRO CONFERENCE, 2004, : 567 - 574
  • [44] Symbolic clustering of large datasets
    Lechevallier, Yves
    Verde, Rosanna
    de Carvalho, Francisco de A. T.
    DATA SCIENCE AND CLASSIFICATION, 2006, : 193 - +
  • [45] Research on Interactive Visualization Clustering Method Based on the Radar Chart
    Li, Huijun
    Li, Zhiquan
    Peng, Jingxuan
    Zhang, Lihui
    INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, 2013, 241-244 : 1633 - +
  • [46] A visualization-based approach for project portfolio selection
    da Silva, Celmar Guimaraes
    Meidanis, Joao
    Moura, Arnaldo Vieira
    Souza, Maria Angelica
    Viadanna, Paulo, Jr.
    Costa Lima, Gabriel A.
    de Barros, Rafael S. V.
    NEW ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2016, 444 : 835 - 844
  • [47] Visualization-based mapping of language function in the brain
    Modayur, B
    Prothero, J
    Ojemann, G
    Maravilla, K
    Brinkley, J
    NEUROIMAGE, 1997, 6 (04) : 245 - 258
  • [48] Metagenomics-based signature clustering and interactive visualization analysis
    Araujo Santos, Vitor Cirilo
    Correa, Leandro
    Meiguins, Bianchi
    Oliveira, Guilherme
    Alves, Ronnie
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [49] FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web
    Probst, Daniel
    Reymond, Jean-Louis
    BIOINFORMATICS, 2018, 34 (08) : 1433 - 1435
  • [50] Memory Visualization-Based Malware Detection Technique
    Shah, Syed Shakir Hameed
    Jamil, Norziana
    Khan, Atta Ur Rehman
    SENSORS, 2022, 22 (19)