Efficient Data Stream Clustering with Sliding Windows based on Locality-Sensitive Hashing

被引:16
|
作者
Youn, Jonghem [1 ]
Shim, Junho [2 ]
Lee, Sang-Goo [3 ]
机构
[1] Voost Inc, Seoul 06232, South Korea
[2] Sookmyung Womens Univ, Dept Comp Sci, Seoul 04310, South Korea
[3] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul 08826, South Korea
来源
IEEE ACCESS | 2018年 / 6卷
基金
新加坡国家研究基金会;
关键词
Data stream; k-means clustering; locality-sensitive hashing; sliding window; EVOLVING DATA STREAMS; AFFINITY PROPAGATION;
D O I
10.1109/ACCESS.2018.2877138
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data stream clustering over sliding windows generates clusters as the window moves. However, iterative clustering using all data in a window is highly inefficient in terms of memory use and computational load. In this paper, we improve data stream clustering over sliding windows using sliding window aggregation and nearest neighbor search techniques. Our algorithm constructs and maintains temporal group features as a summary of the window using the sliding window aggregation technique. In order to maintain a constant size for the summary, the algorithm reduces the size of the summary by joining the nearest neighbor. We exploit locality-sensitive hashing for rapid nearest neighbor searching. In addition, we also suggest a re-clustering policy that determines whether to append a new summary to pre-existing clusters or to perform clustering on the whole summary. We conduct experiments on real-world and synthetic datasets in order to demonstrate that our algorithm can significantly improve continuous clustering on data streams with sliding windows.
引用
收藏
页码:63757 / 63776
页数:20
相关论文
共 50 条
  • [1] Locality-Sensitive Hashing Optimizations for Fast Malware Clustering
    Oprisa, Ciprian
    Checiches, Marius
    Nandrean, Adrian
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 97 - +
  • [2] An adaptive mean shift clustering algorithm based on locality-sensitive hashing
    Zhang, Xinhong
    Cui, Yanbin
    Li, Duoyi
    Liu, Xianxing
    Zhang, Fan
    [J]. OPTIK, 2012, 123 (20): : 1891 - 1894
  • [3] A Novel Cluster Prediction Approach Based on Locality-Sensitive Hashing for Fuzzy Clustering of Categorical Data
    Toan Nguyen Mau
    Inoguchi, Yasushi
    Van-Nam Huynh
    [J]. IEEE ACCESS, 2022, 10 : 34196 - 34206
  • [4] In Defense of Locality-Sensitive Hashing
    Ding, Kun
    Huo, Chunlei
    Fan, Bin
    Xiang, Shiming
    Pan, Chunhong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (01) : 87 - 103
  • [5] Kernelized Locality-Sensitive Hashing
    Kulis, Brian
    Grauman, Kristen
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (06) : 1092 - 1104
  • [6] Fast hierarchical clustering algorithm using locality-sensitive hashing
    Koga, H
    Ishibashi, T
    Watanabe, T
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 114 - 128
  • [7] Correlated Locality-Sensitive Hashing
    Pagh, Rasmus
    [J]. ALGORITHMS - ESA 2015, 2015, 9294
  • [8] Efficient locality-sensitive hashing over high-dimensional streaming data
    Wang, Hao
    Yang, Chengcheng
    Zhang, Xiangliang
    Gao, Xin
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (05): : 3753 - 3766
  • [9] Efficient Locality-Sensitive Hashing Over High-Dimensional Data Streams
    Yang, Chengcheng
    Deng, Dong
    Shang, Shuo
    Shao, Ling
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1994 - 1997
  • [10] Efficient locality-sensitive hashing over high-dimensional streaming data
    Hao Wang
    Chengcheng Yang
    Xiangliang Zhang
    Xin Gao
    [J]. Neural Computing and Applications, 2023, 35 : 3753 - 3766