Incremental entropy-based clustering on categorical data streams with concept drift

被引:25
|
作者
Li, Yanhong
Li, Deyu [1 ]
Wang, Suge
Zhai, Yanhui
机构
[1] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Shanxi, Peoples R China
关键词
Categorical data stream; Clustering; Data labeling; Concept drift detection; Cluster evolving analysis; FRAMEWORK;
D O I
10.1016/j.knosys.2014.02.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering on categorical data streams is a relatively new field that has not received as much attention as static data and numerical data streams. One of the main difficulties in categorical data analysis is lacking in an appropriate way to define the similarity or dissimilarity measure on data. In this paper, we propose three dissimilarity measures: a point-cluster dissimilarity measure (based on incremental entropy), a cluster-cluster dissimilarity measure (based on incremental entropy) and a dissimilarity measure between two cluster distributions (based on sample standard deviation). We then propose an integrated framework for clustering categorical data streams with three algorithms: Minimal Dissimilarity Data Labeling (MDDL), Concept Drift Detection (CDD) and Cluster Evolving Analysis (CEA). We also make comparisons with other algorithms on several data streams synthesized from real data sets. Experiments show that the proposed algorithms are more effective in generating clustering results and detecting concept drift. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:33 / 47
页数:15
相关论文
共 50 条
  • [1] An entropy-based subspace clustering algorithm for categorical data
    Carbonera, Joel Luis
    Abel, Mara
    [J]. 2014 IEEE 26TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2014, : 272 - 277
  • [2] An Approach for Data Labelling and Concept Drift Detection Based on Entropy Model in Rough Sets for Clustering Categorical Data
    Reddy, H.
    Raju, S.
    Kumar, B.
    Jayachandra, C.
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2014, 13 (02)
  • [3] Entropy based clustering of data streams with mixed numeric and categorical values
    Wang, Shuyun
    Fan, Yingjie
    Zhang, Chenghong
    Xu, HeXiang
    Hao, Xiulan
    Hu, Yunfa
    [J]. 7TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE IN CONJUNCTION WITH 2ND IEEE/ACIS INTERNATIONAL WORKSHOP ON E-ACTIVITY, PROCEEDINGS, 2008, : 140 - +
  • [4] On Fuzzy Clustering of Data Streams with Concept Drift
    Jaworski, Maciej
    Duda, Piotr
    Pietruczuk, Lena
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 82 - 91
  • [5] Entropy-based concept drift detection in information systems
    Sun, Yingying
    Mi, Jusheng
    Jin, Chenxia
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 290
  • [6] Clustering categorical data streams
    He, Zengyou
    Xu, Xiaofei
    Deng, Shengchun
    Huang, Joshua Zhexue
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) : 185 - 192
  • [7] Predicting concept drift in data streams using metadata clustering
    Anderson, Robert
    Koh, Yun Sing
    Dobbie, Gillian
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [8] Online Clustering for Novelty Detection and Concept Drift in Data Streams
    Garcia, Kemilly Dearo
    Poel, Mannes
    Kok, Joost N.
    de Carvalho, Andre C. P. L. F.
    [J]. PROGRESS IN ARTIFICIAL INTELLIGENCE, PT II, 2019, 11805 : 448 - 459
  • [9] Clustering of Concept-Drift Categorical Data Implementation in JAVA']JAVA
    Madhavi, K. Reddy
    Babu, A. Vinaya
    Raju, S. Viswanadha
    [J]. GLOBAL TRENDS IN INFORMATION SYSTEMS AND SOFTWARE APPLICATIONS, PT 2, 2012, 270 : 639 - +
  • [10] A Similarity Measurement with Entropy-Based Weighting for Clustering Mixed Numerical and Categorical Datasets
    Que, Xia
    Jiang, Siyuan
    Yang, Jiaoyun
    An, Ning
    [J]. ALGORITHMS, 2021, 14 (06)