Online Clustering for Novelty Detection and Concept Drift in Data Streams

被引:6
|
作者
Garcia, Kemilly Dearo [1 ,2 ]
Poel, Mannes [1 ]
Kok, Joost N. [1 ]
de Carvalho, Andre C. P. L. F. [2 ]
机构
[1] Univ Twente, Enschede, Netherlands
[2] Univ Sao Paulo, ICMC, Sao Paulo, Brazil
来源
关键词
Data stream; Concept drift; Novelty detection; Online learning; CLASSIFICATION;
D O I
10.1007/978-3-030-30244-3_37
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data streams are related to large amounts of data that can continuously arrive with a probability distribution that may change over time. Depending on the changes in the data distribution, different phenomena can occur, like new classes can appear or concept drift can occur in existing classes. Machine Learning algorithms have been often used to model this data. New classes are patterns that were not seen during the training of the current classification model, but appear after some time. Concept drift occurs when the concepts associated with a dataset change as new data arrive. This paper proposes a new algorithm based on kNN that uses micro-clusters as prototypes and incrementally updates the micro-clusters or creates new micro-clusters when novelties are detected. In the online phase, each instance close to a micro-cluster is considered an extension of the micro-cluster, being used to adapt the model to concept drift. The proposed algorithm is experimentally compared with a stateof-the-art classifier from the data stream literature and one baseline. According to the experimental results, the proposed algorithm increases the predictive performance over time by incrementally learning changes in the data distribution.
引用
收藏
页码:448 / 459
页数:12
相关论文
共 50 条
  • [21] Online Clustering for Topic Detection in Social Data Streams
    Comito, Carmela
    Pizzuti, Clara
    Procopio, Nicola
    2016 IEEE 28TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2016), 2016, : 362 - 369
  • [22] On learning guarantees to unsupervised concept drift detection on data streams
    de Mello, Rodrigo F.
    Vaz, Yule
    Grossi, Carlos H.
    Bifet, Albert
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 117 : 90 - 102
  • [23] Nacre: Proactive Recurrent Concept Drift Detection in Data Streams
    Wu, Ocean
    Koh, Yun Sing
    Dobbie, Gillian
    Lacombe, Thomas
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [24] Incremental entropy-based clustering on categorical data streams with concept drift
    Li, Yanhong
    Li, Deyu
    Wang, Suge
    Zhai, Yanhui
    KNOWLEDGE-BASED SYSTEMS, 2014, 59 : 33 - 47
  • [25] Efficient Online Novelty Detection in News Streams
    Karkali, Margarita
    Rousseau, Francois
    Ntoulas, Alexandros
    Vazirgiannis, Michalis
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2013, PT I, 2013, 8180 : 57 - 71
  • [26] CPOCEDS-concept preserving online clustering for evolving data streams
    Jafseer, K. T.
    Shailesh, S.
    Sreekumar, A.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (03): : 2983 - 2998
  • [27] Classification of concept drift data streams
    Padmalatha, E.
    Reddy, C. R. K.
    Rani, B. Padmaja
    2014 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND APPLICATIONS (ICISA), 2014,
  • [28] Novelty detection with application to data streams
    Spinosa, Eduardo J.
    de Carvalho, Andre Ponce de Leon F.
    Gama, Joao
    INTELLIGENT DATA ANALYSIS, 2009, 13 (03) : 405 - 422
  • [29] Intrusion detection in the IoT data streams using concept drift localization
    Chu, Renjie
    Jin, Peiyuan
    Qiao, Hanli
    Feng, Quanxi
    AIMS MATHEMATICS, 2024, 9 (01): : 1535 - 1561
  • [30] Accumulating regional density dissimilarity for concept drift detection in data streams
    Liu, Anjin
    Lu, Jie
    Liu, Feng
    Zhang, Guangquan
    PATTERN RECOGNITION, 2018, 76 : 256 - 272