Hierarchical clustering for multiple nominal data streams with evolving behaviour

被引:0
|
作者
Jerry W. Sangma
Mekhla Sarkar
Vipin Pal
Amit Agrawal
机构
[1] National Institute of Technology Meghalaya,
[2] Chang Gung University,undefined
[3] Wells Fargo & Company,undefined
来源
关键词
Data streams; Hierarchical clustering; Concept evolution; Nominal data;
D O I
暂无
中图分类号
学科分类号
摘要
Over the decade, a number of attempts have been made towards data stream clustering, but most of the works fall under clustering by example approach. There are a number of applications where clustering by variable approach is required which involves clustering of multiple data streams as opposed to clustering data examples in a data stream. Furthermore, a few works have been presented for clustering multiple data streams and these are applicable to numeric data streams only. Hence, this research gap has motivated current research work. In the present work, a hierarchical clustering technique has been proposed to cluster multiple data streams where data are nominal. To address the concept changes in the data streams splitting and merging of the clusters in the hierarchical structure are performed. The decision to split or merge is based on the entropy measure, representing the cluster’s degree of disparity. The performance of the proposed technique has been analysed and compared to Agglomerative Nesting clustering technique on synthetic as well as a real-world dataset in terms of Dunn Index, Modified Hubert Γ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varGamma $$\end{document} statistic, Cophenetic Correlation Coefficient, and Purity. The proposed technique outperforms Agglomerative Nesting clustering technique for concept evolving data streams. Furthermore, the effect of concept evolution on clustering structure and average entropy has been visualised for detailed analysis and understanding.
引用
收藏
页码:1737 / 1761
页数:24
相关论文
共 50 条
  • [41] Incremental density-based ensemble clustering over evolving data streams
    Khan, Imran
    Huang, Joshua Z.
    Ivanov, Kamen
    [J]. NEUROCOMPUTING, 2016, 191 : 34 - 43
  • [42] A Clustering Algorithm for Evolving Data Streams Using Temporal Spatial Hyper Cube
    Al-amri, Redhwan
    Murugesan, Raja Kumar
    Almutairi, Mubarak
    Munir, Kashif
    Alkawsi, Gamal
    Baashar, Yahia
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [43] Density and sliding window-based clustering over evolving data streams
    Yu, Yanwei
    Zhao, Jindong
    Zhang, Yonggang
    Wen, Changci
    [J]. ICIC Express Letters, Part B: Applications, 2015, 6 (08): : 2275 - 2283
  • [44] A single pass algorithm for clustering evolving data streams based on swarm intelligence
    Agostino Forestiero
    Clara Pizzuti
    Giandomenico Spezzano
    [J]. Data Mining and Knowledge Discovery, 2013, 26 : 1 - 26
  • [45] A single pass algorithm for clustering evolving data streams based on swarm intelligence
    Forestiero, Agostino
    Pizzuti, Clara
    Spezzano, Giandomenico
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 26 (01) : 1 - 26
  • [46] Flock Stream: a Bio-inspired Algorithm for Clustering Evolving Data Streams
    Forestiero, Agostino
    Pizzuti, Clara
    Spezzano, Giandomenico
    [J]. ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 1 - 8
  • [47] EINCKM: An Enhanced Prototype-based Method for Clustering Evolving Data Streams in Big Data
    Al Abd Alazeez, Ammar
    Jassim, Sabah
    Du, Hongbo
    [J]. ICPRAM: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS, 2017, : 173 - 183
  • [48] Clustering data streams
    Guha, S
    Mishra, N
    Motwani, R
    O'Callaghan, L
    [J]. 41ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2000, : 359 - 366
  • [49] Clustering high dimensional data streams at multiple time granularities
    Yan Xiao-Long
    Hong Shen
    [J]. ICIEA 2008: 3RD IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, PROCEEDINGS, VOLS 1-3, 2008, : 2458 - 2463
  • [50] Density-Based Clustering of Data Streams at Multiple Resolutions
    Wan, Li
    Ng, Wee Keong
    Dang, Xuan Hong
    Yu, Philip S.
    Zhang, Kuan
    [J]. ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (03)