FHC-NDS: Fuzzy Hierarchical Clustering of Multiple Nominal Data Streams

被引:5
|
作者
Sangma, Jerry W. [1 ]
Yogita [1 ]
Pal, Vipin [1 ]
Kumar, Neeraj [2 ,3 ,4 ,5 ]
Kushwaha, Riti [6 ]
机构
[1] Natl Inst Technol Meghalaya, Shillong 793003, India
[2] Thapar Inst Engn & Technol, Patiala 147004, India
[3] Univ Petr & Energy Studies, Sch Comp Sci, Dehra Dun 248001, Uttarakhand, India
[4] King Abdulaziz Univ, Jeddah 21589, Saudi Arabia
[5] Asia Univ, Dept Comp Sci & Informat Engn, Taichung 41354, Taiwan
[6] Bennett Univ, Noida 201310, India
关键词
Measurement; Data mining; Entropy; Clustering methods; Time series analysis; Indexes; Merging; Clustering; data streams; fuzzy; hierarchical;
D O I
10.1109/TFUZZ.2022.3189083
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need of fuzzy clustering arises in many real-world applications such as clumping the users based on their web browsing behavior where the behavior of a user can be similar to two different sets of users at the same instance. The aptness of fuzzy clustering for data streams is further intensified given their concept evolving nature. Data streams can be clustered either by following clustering-by-variable approach or clustering-by-example approach. Most of the existing fuzzy clustering-by-variable methods are applicable to numeric data streams only. In this article, a fuzzy hierarchical clustering method is proposed for clustering multiple nominal data streams using clustering-by-variable approach. The fuzzy affinity of data streams to different clusters is calculated using normalized cosine similarity to the cluster centroids. It handles the concept evolution by updating the hierarchical clustering structure by either merging and/or splitting the nodes depending on the extent to which the node entropy changes. The performance of the proposed method is analyzed and compared to hierarchical clustering for multiple nominal data streams (HCND), semifuzzy online divisive-agglomerative clustering, and nTreeClus on synthetic as well as real-world web-browsing dataset where it has outperformed all three in terms of cluster quality as quantified by Dunn index, modified Hubert G statistic, and adjusted rand index. Furthermore, the experimental results show that the proposed method is highly promising with regard to capturing fuzzy clusters as indicated by Xie-Beni index, partition coefficient, and partition entropy.
引用
收藏
页码:786 / 798
页数:13
相关论文
共 50 条
  • [21] Statistical hierarchical clustering algorithm for outlier detection in evolving data streams
    Dalibor Krleža
    Boris Vrdoljak
    Mario Brčić
    Machine Learning, 2021, 110 : 139 - 184
  • [22] Statistical hierarchical clustering algorithm for outlier detection in evolving data streams
    Krleza, Dalibor
    Vrdoljak, Boris
    Brcic, Mario
    MACHINE LEARNING, 2021, 110 (01) : 139 - 184
  • [23] Clustering Algorithm for Multiple Data Streams Based on Data Cloud Node
    Li, Sa
    Shao, Liangshan
    PROGRESS IN MECHATRONICS AND INFORMATION TECHNOLOGY, PTS 1 AND 2, 2014, 462-463 : 247 - 250
  • [24] Discovery of fuzzy temporal associations in multiple data streams
    Sudkamp, T
    SOFT COMPUTING: METHODOLOGIES AND APPLICATIONS, 2005, : 3 - 13
  • [25] FHC-PCIA: A Physical Cell Identification Allocation Method Based on Fuzzy Hierarchical Clustering for Heterogeneous Cellular Network
    Tu, Shanshan
    Liu, Meng
    Waqas, Muhammad
    Rehman, Sadaqat Ur
    Zhu, Ran
    Liu, Lei
    IEEE ACCESS, 2018, 6 : 46976 - 46987
  • [26] Fuzzy Clustering-Based Adaptive Regression for Drifting Data Streams
    Song, Yiliao
    Lu, Jie
    Lu, Haiyan
    Zhang, Guangquan
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2020, 28 (03) : 544 - 557
  • [27] Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering
    Castellano, G.
    Fanelli, A. M.
    FUZZY LOGIC AND SOFT COMPUTING APPLICATIONS, WILF 2016, 2017, 10147 : 185 - 194
  • [28] A Fuzzy Clustering Approach to Non-stationary Data Streams Learning
    Abdullatif, A.
    Masulli, F.
    Rovetta, S.
    Cabri, A.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 768 - 769
  • [29] Data integration by fuzzy similarity-based hierarchical clustering
    Ciaramella, Angelo
    Nardone, Davide
    Staiano, Antonino
    BMC BIOINFORMATICS, 2020, 21 (Suppl 10)
  • [30] Density-Based Clustering of Data Streams at Multiple Resolutions
    Wan, Li
    Ng, Wee Keong
    Dang, Xuan Hong
    Yu, Philip S.
    Zhang, Kuan
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2009, 3 (03)