An evolving approach to the similarity-based modeling for online clustering in non-stationary environments

被引:0
|
作者
Almeida, Nayron Morais [1 ]
Camargos, Murilo Osorio [2 ]
Mariano, Denis G. B. [3 ]
Bomfim, Carlos H. M. [3 ]
Palhares, Reinaldo M. [3 ]
Caminhas, Walmir M. [3 ]
机构
[1] Univ Fed Minas Gerais, Grad Program Elect Engn, Ave Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil
[2] Univ Estadual Montes Claros, Grad Program Comp Modeling & Syst, Ave Rui Braga S-N, BR-39401089 Montes Claros, MG, Brazil
[3] Univ Fed Minas Gerais, Dept Elect Engn, Ave Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil
关键词
Clustering; Online; Evolving; Similarity-based modeling; DATA STREAM;
D O I
10.1007/s12530-024-09646-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel evolving approach based on the Similarity-Based Modeling (SBM), a technique widely used in industrial applications of anomaly detection and multiclass classification. The proposed approach, which inherits from SBM, uses a simple model-matrix composed of historical points to represent each cluster. Its inference procedure for a given input instance consists only of generating an estimate, considering each cluster, and then assigning the input to the most similar cluster according to a novel membership function that considers approximation error and data density. The main features of our approach include a simple and intuitive learning scheme, the ability to model clusters of any shape without using micro-cluster-like procedures, robustness to noisy data, and low computational effort. We evaluate the effectiveness of the proposed approach on fifteen datasets widely used in the literature, assessing its ability to deal with overlapping clusters, clusters with arbitrary shape, noisy data, and high dimensionality. Using Adjusted Rand Index (ARI) and Purity metrics, the proposed algorithm was compared with eight recent state-of-the-art algorithms, and the proposed method achieved the highest performance on most of the datasets. On the remaining datasets, it showed similar performance to other methods. Averaging over the fifteen datasets, our approach achieved an ARI value of 0.8872 and a Purity value of 0.9107. The most competitive method, considering ARI, achieved an average value of 0.6988, and considering Purity, achieved an average value of 0.9257. This shows the effectiveness of the proposed approach.
引用
收藏
页数:30
相关论文
共 50 条
  • [41] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [42] Similarity-based Fuzzy clustering for user profiling
    Castellano, Giovanna
    Fanelli, A. Maria
    Mencar, Corrado
    Torsello, M. Alessandra
    PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 75 - 78
  • [43] Similarity-Based Clustering For IoT Device Classification
    Dupont, Guillaume
    Leite, Cristoffer
    dos Santos, Daniel Ricardo
    Costante, Elisa
    den Hartog, Jerry
    Etalle, Sandro
    2021 IEEE INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (IEEE COINS 2021), 2021, : 104 - 110
  • [44] Similarity-based clustering for patterns of extreme values
    de Carvalho, Miguel
    Huser, Raphael
    Rubio, Rodrigo
    STAT, 2023, 12 (01):
  • [45] A similarity-based soft clustering algorithm for documents
    Lin, KI
    Kondadadi, R
    SEVENTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2001, : 40 - 47
  • [46] A Cost Function for Similarity-Based Hierarchical Clustering
    Dasgupta, Sanjoy
    STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 118 - 127
  • [47] Extremal clustering in non-stationary random sequences
    Graeme Auld
    Ioannis Papastathopoulos
    Extremes, 2021, 24 : 725 - 752
  • [48] Extremal clustering in non-stationary random sequences
    Auld, Graeme
    Papastathopoulos, Ioannis
    EXTREMES, 2021, 24 (04) : 725 - 752
  • [49] SC-OCR: similarity-based clustering and optimum cache replacement approach
    Subramanian, Sabitha Malli
    Soundarajan, Vijayalakshmi
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (04):
  • [50] STREAMING INFERENCE FOR INFINITE NON-STATIONARY CLUSTERING
    Schaeffer, Rylan
    Liu, Gabrielle Kaili-May
    Du, Yilun
    Linderman, Scott
    Fiete, Ila Rani
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199