An evolving approach to the similarity-based modeling for online clustering in non-stationary environments

被引：0

作者：

Almeida, Nayron Morais ^{[1
]}

Camargos, Murilo Osorio ^{[2
]}

Mariano, Denis G. B. ^{[3
]}

Bomfim, Carlos H. M. ^{[3
]}

Palhares, Reinaldo M. ^{[3
]}

Caminhas, Walmir M. ^{[3
]}

机构：

[1] Univ Fed Minas Gerais, Grad Program Elect Engn, Ave Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil

[2] Univ Estadual Montes Claros, Grad Program Comp Modeling & Syst, Ave Rui Braga S-N, BR-39401089 Montes Claros, MG, Brazil

[3] Univ Fed Minas Gerais, Dept Elect Engn, Ave Antonio Carlos 6627, BR-31270901 Belo Horizonte, MG, Brazil

来源：

EVOLVING SYSTEMS | 2025年 / 16卷 / 01期

关键词：

Clustering; Online; Evolving; Similarity-based modeling; DATA STREAM;

D O I：

10.1007/s12530-024-09646-w

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a novel evolving approach based on the Similarity-Based Modeling (SBM), a technique widely used in industrial applications of anomaly detection and multiclass classification. The proposed approach, which inherits from SBM, uses a simple model-matrix composed of historical points to represent each cluster. Its inference procedure for a given input instance consists only of generating an estimate, considering each cluster, and then assigning the input to the most similar cluster according to a novel membership function that considers approximation error and data density. The main features of our approach include a simple and intuitive learning scheme, the ability to model clusters of any shape without using micro-cluster-like procedures, robustness to noisy data, and low computational effort. We evaluate the effectiveness of the proposed approach on fifteen datasets widely used in the literature, assessing its ability to deal with overlapping clusters, clusters with arbitrary shape, noisy data, and high dimensionality. Using Adjusted Rand Index (ARI) and Purity metrics, the proposed algorithm was compared with eight recent state-of-the-art algorithms, and the proposed method achieved the highest performance on most of the datasets. On the remaining datasets, it showed similar performance to other methods. Averaging over the fifteen datasets, our approach achieved an ARI value of 0.8872 and a Purity value of 0.9107. The most competitive method, considering ARI, achieved an average value of 0.6988, and considering Purity, achieved an average value of 0.9257. This shows the effectiveness of the proposed approach.

引用

页数：30

共 50 条

[41] A Similarity-Based Clustering Algorithm for Fuzzy Data
Hung, Wen-Liang
Yang, Miin-Shen
2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
[42] Similarity-based Fuzzy clustering for user profiling
Castellano, Giovanna
Fanelli, A. Maria
Mencar, Corrado
Torsello, M. Alessandra
PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 75 - 78
[43] Similarity-Based Clustering For IoT Device Classification
Dupont, Guillaume
Leite, Cristoffer
dos Santos, Daniel Ricardo
Costante, Elisa
den Hartog, Jerry
Etalle, Sandro
2021 IEEE INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (IEEE COINS 2021), 2021, : 104 - 110
[44] Similarity-based clustering for patterns of extreme values
de Carvalho, Miguel
Huser, Raphael
Rubio, Rodrigo
STAT, 2023, 12 (01):
[45] A similarity-based soft clustering algorithm for documents
Lin, KI
Kondadadi, R
SEVENTH INTERNATIONAL CONFERENCE ON DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2001, : 40 - 47
[46] A Cost Function for Similarity-Based Hierarchical Clustering
Dasgupta, Sanjoy
STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 118 - 127
[47] Extremal clustering in non-stationary random sequences
Graeme Auld
Ioannis Papastathopoulos
Extremes, 2021, 24 : 725 - 752
[48] Extremal clustering in non-stationary random sequences
Auld, Graeme
Papastathopoulos, Ioannis
EXTREMES, 2021, 24 (04) : 725 - 752
[49] SC-OCR: similarity-based clustering and optimum cache replacement approach
Subramanian, Sabitha Malli
Soundarajan, Vijayalakshmi
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (04):
[50] STREAMING INFERENCE FOR INFINITE NON-STATIONARY CLUSTERING
Schaeffer, Rylan
Liu, Gabrielle Kaili-May
Du, Yilun
Linderman, Scott
Fiete, Ila Rani
CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 199, 2022, 199

← 1 2 3 4 5 →