FHC-NDS: Fuzzy Hierarchical Clustering of Multiple Nominal Data Streams

被引:5
|
作者
Sangma, Jerry W. [1 ]
Yogita [1 ]
Pal, Vipin [1 ]
Kumar, Neeraj [2 ,3 ,4 ,5 ]
Kushwaha, Riti [6 ]
机构
[1] Natl Inst Technol Meghalaya, Shillong 793003, India
[2] Thapar Inst Engn & Technol, Patiala 147004, India
[3] Univ Petr & Energy Studies, Sch Comp Sci, Dehra Dun 248001, Uttarakhand, India
[4] King Abdulaziz Univ, Jeddah 21589, Saudi Arabia
[5] Asia Univ, Dept Comp Sci & Informat Engn, Taichung 41354, Taiwan
[6] Bennett Univ, Noida 201310, India
关键词
Measurement; Data mining; Entropy; Clustering methods; Time series analysis; Indexes; Merging; Clustering; data streams; fuzzy; hierarchical;
D O I
10.1109/TFUZZ.2022.3189083
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The need of fuzzy clustering arises in many real-world applications such as clumping the users based on their web browsing behavior where the behavior of a user can be similar to two different sets of users at the same instance. The aptness of fuzzy clustering for data streams is further intensified given their concept evolving nature. Data streams can be clustered either by following clustering-by-variable approach or clustering-by-example approach. Most of the existing fuzzy clustering-by-variable methods are applicable to numeric data streams only. In this article, a fuzzy hierarchical clustering method is proposed for clustering multiple nominal data streams using clustering-by-variable approach. The fuzzy affinity of data streams to different clusters is calculated using normalized cosine similarity to the cluster centroids. It handles the concept evolution by updating the hierarchical clustering structure by either merging and/or splitting the nodes depending on the extent to which the node entropy changes. The performance of the proposed method is analyzed and compared to hierarchical clustering for multiple nominal data streams (HCND), semifuzzy online divisive-agglomerative clustering, and nTreeClus on synthetic as well as real-world web-browsing dataset where it has outperformed all three in terms of cluster quality as quantified by Dunn index, modified Hubert G statistic, and adjusted rand index. Furthermore, the experimental results show that the proposed method is highly promising with regard to capturing fuzzy clusters as indicated by Xie-Beni index, partition coefficient, and partition entropy.
引用
收藏
页码:786 / 798
页数:13
相关论文
共 50 条
  • [1] Hierarchical clustering for multiple nominal data streams with evolving behaviour
    Sangma, Jerry W.
    Sarkar, Mekhla
    Pal, Vipin
    Agrawal, Amit
    Yogita
    [J]. COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 1737 - 1761
  • [2] Hierarchical clustering for multiple nominal data streams with evolving behaviour
    Jerry W. Sangma
    Mekhla Sarkar
    Vipin Pal
    Amit Agrawal
    [J]. Complex & Intelligent Systems, 2022, 8 : 1737 - 1761
  • [3] On Fuzzy Clustering Algorithms for Nominal Data
    Kanzawa, Yuchi
    [J]. 2020 JOINT 11TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS AND 21ST INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (SCIS-ISIS), 2020, : 234 - 238
  • [4] An Effective Performance of Fuzzy Hierarchical Clustering Using Time Series Data Streams
    Kavitha, V.
    Punithavalli, M.
    [J]. COMPUTER NETWORKS AND INFORMATION TECHNOLOGIES, 2011, 142 : 242 - +
  • [5] Clustering Multiple Data Streams
    Balzanella, Antonio
    Lechevallier, Yves
    Verde, Rosanna
    [J]. NEW PERSPECTIVES IN STATISTICAL MODELING AND DATA ANALYSIS, 2011, : 247 - 254
  • [6] Clustering on demand for multiple data streams
    Dai, BR
    Huang, JW
    Yeh, MY
    Chen, MS
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 367 - 370
  • [7] On Fuzzy Clustering of Data Streams with Concept Drift
    Jaworski, Maciej
    Duda, Piotr
    Pietruczuk, Lena
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 82 - 91
  • [8] On Resources Optimization in Fuzzy Clustering of Data Streams
    Jaworski, Maciej
    Pietruczuk, Lena
    Duda, Piotr
    [J]. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, PT II, 2012, 7268 : 92 - 99
  • [9] ODAC: Hierarchical Clustering of Time Series Data Streams
    Rodrigues, Pedro Pereira
    Gama, Joao
    Pedroso, Joao Pedro
    [J]. PROCEEDINGS OF THE SIXTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2006, : 499 - 503
  • [10] Hierarchical clustering of time-series data streams
    Rodrigues, Pedro Pereira
    Gama, Joao
    Pedroso, Joao Pedro
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (05) : 615 - 627