FHC-NDS: Fuzzy Hierarchical Clustering of Multiple Nominal Data Streams

被引:5
|
作者
Sangma, Jerry W. [1 ]
Yogita [1 ]
Pal, Vipin [1 ]
Kumar, Neeraj [2 ,3 ,4 ,5 ]
Kushwaha, Riti [6 ]
机构
[1] Natl Inst Technol Meghalaya, Shillong 793003, India
[2] Thapar Inst Engn & Technol, Patiala 147004, India
[3] Univ Petr & Energy Studies, Sch Comp Sci, Dehra Dun 248001, Uttarakhand, India
[4] King Abdulaziz Univ, Jeddah 21589, Saudi Arabia
[5] Asia Univ, Dept Comp Sci & Informat Engn, Taichung 41354, Taiwan
[6] Bennett Univ, Noida 201310, India
关键词
Measurement; Data mining; Entropy; Clustering methods; Time series analysis; Indexes; Merging; Clustering; data streams; fuzzy; hierarchical;
D O I
10.1109/TFUZZ.2022.3189083
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need of fuzzy clustering arises in many real-world applications such as clumping the users based on their web browsing behavior where the behavior of a user can be similar to two different sets of users at the same instance. The aptness of fuzzy clustering for data streams is further intensified given their concept evolving nature. Data streams can be clustered either by following clustering-by-variable approach or clustering-by-example approach. Most of the existing fuzzy clustering-by-variable methods are applicable to numeric data streams only. In this article, a fuzzy hierarchical clustering method is proposed for clustering multiple nominal data streams using clustering-by-variable approach. The fuzzy affinity of data streams to different clusters is calculated using normalized cosine similarity to the cluster centroids. It handles the concept evolution by updating the hierarchical clustering structure by either merging and/or splitting the nodes depending on the extent to which the node entropy changes. The performance of the proposed method is analyzed and compared to hierarchical clustering for multiple nominal data streams (HCND), semifuzzy online divisive-agglomerative clustering, and nTreeClus on synthetic as well as real-world web-browsing dataset where it has outperformed all three in terms of cluster quality as quantified by Dunn index, modified Hubert G statistic, and adjusted rand index. Furthermore, the experimental results show that the proposed method is highly promising with regard to capturing fuzzy clusters as indicated by Xie-Beni index, partition coefficient, and partition entropy.
引用
收藏
页码:786 / 798
页数:13
相关论文
共 50 条
  • [31] Clustering high dimensional data streams at multiple time granularities
    Yan Xiao-Long
    Hong Shen
    ICIEA 2008: 3RD IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, PROCEEDINGS, VOLS 1-3, 2008, : 2458 - 2463
  • [32] Data integration by fuzzy similarity-based hierarchical clustering
    Angelo Ciaramella
    Davide Nardone
    Antonino Staiano
    BMC Bioinformatics, 21
  • [33] Incremental Fuzzy Clustering With Multiple Medoids for Large Data
    Wang, Yangtao
    Chen, Lihui
    Mei, Jian-Ping
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2014, 22 (06) : 1557 - 1568
  • [34] Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting
    Peter Laurinec
    Mária Lucká
    Data Mining and Knowledge Discovery, 2019, 33 : 413 - 445
  • [35] Interpretable multiple data streams clustering with clipped streams representation for the improvement of electricity consumption forecasting
    Laurinec, Peter
    Lucka, Maria
    DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (02) : 413 - 445
  • [36] Monitoring of Multiple Binary Data Streams using a Hierarchical Model Structure
    Das, Devashish
    Chen, Yong
    Zhou, Shiyu
    Sievenpiper, Crispian
    QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL, 2016, 32 (04) : 1307 - 1319
  • [37] A clustering algorithm for multiple data streams based on spectral component similarity
    Chen Ling
    Zou Ling-Jun
    Tu Li
    INFORMATION SCIENCES, 2012, 183 (01) : 35 - 47
  • [38] A Clustering Algorithm for Multiple Data Streams Based on Spectral Component Similarity
    Zou Lingjun
    Chen Ling
    Tu Ii
    ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 595 - 603
  • [39] A fault-tolerant clustering algorithm for processing data from multiple streams
    Otero, Abraham
    Felix, Paulo
    Marquez, David G.
    Garcia, Constantino A.
    Caffarena, Gabriel
    INFORMATION SCIENCES, 2022, 584 : 649 - 664
  • [40] Interactive Clustering for Exploring Multiple Data Streams at Different Time Scales and Granularity
    Holst, Anders
    Bae, Juhee
    Karlsson, Alexander
    Bouguelia, Mohamed-Rafik
    IDM-WSDM 2019: WORKSHOP ON INTERACTIVE DATA MINING, 2019,