Efficient Data Stream Clustering with Sliding Windows based on Locality-Sensitive Hashing

被引:16
|
作者
Youn, Jonghem [1 ]
Shim, Junho [2 ]
Lee, Sang-Goo [3 ]
机构
[1] Voost Inc, Seoul 06232, South Korea
[2] Sookmyung Womens Univ, Dept Comp Sci, Seoul 04310, South Korea
[3] Seoul Natl Univ, Dept Comp Sci & Engn, Seoul 08826, South Korea
来源
IEEE ACCESS | 2018年 / 6卷
基金
新加坡国家研究基金会;
关键词
Data stream; k-means clustering; locality-sensitive hashing; sliding window; EVOLVING DATA STREAMS; AFFINITY PROPAGATION;
D O I
10.1109/ACCESS.2018.2877138
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data stream clustering over sliding windows generates clusters as the window moves. However, iterative clustering using all data in a window is highly inefficient in terms of memory use and computational load. In this paper, we improve data stream clustering over sliding windows using sliding window aggregation and nearest neighbor search techniques. Our algorithm constructs and maintains temporal group features as a summary of the window using the sliding window aggregation technique. In order to maintain a constant size for the summary, the algorithm reduces the size of the summary by joining the nearest neighbor. We exploit locality-sensitive hashing for rapid nearest neighbor searching. In addition, we also suggest a re-clustering policy that determines whether to append a new summary to pre-existing clusters or to perform clustering on the whole summary. We conduct experiments on real-world and synthetic datasets in order to demonstrate that our algorithm can significantly improve continuous clustering on data streams with sliding windows.
引用
收藏
页码:63757 / 63776
页数:20
相关论文
共 50 条
  • [21] Hardware acceleration of k-mer clustering using locality-sensitive hashing
    Soto, Javier E.
    Krohmer, Thomas
    Hernandez, Cecilia
    Figueroa, Miguel
    2019 22ND EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2019, : 659 - 662
  • [22] Efficient large-scale sequence comparison by locality-sensitive hashing
    Buhler, J
    BIOINFORMATICS, 2001, 17 (05) : 419 - 428
  • [23] Digital Watermarks for Videos Based on a Locality-Sensitive Hashing Algorithm
    Sun, Yajuan
    Srivastava, Gautam
    MOBILE NETWORKS & APPLICATIONS, 2023, 28 (05): : 1724 - 1737
  • [24] Fast Access for Star Catalog Based on Locality-Sensitive Hashing
    Zhu H.
    Liang B.
    Zhang T.
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2018, 36 (05): : 988 - 994
  • [25] A Scalable ECG Identification System Based on Locality-Sensitive Hashing
    Chu, Hui-Yu
    Lin, Tzu-Yun
    Lee, Song-Hong
    Chiu, Jui-Kun
    Nien, Cing-Ping
    Wu, Shun-Chi
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [26] Using Locality-sensitive Hashing for Rendezvous Search
    Jiang, Guann-Yng
    Chang, Cheng-Shang
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 1743 - 1749
  • [27] Locality-sensitive hashing of permutations for proximity searching
    Figueroa, Karina
    Camarena-Ibarrola, Antonio
    Valero-Elizondo, Luis
    Reyes, Nora
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4677 - 4684
  • [28] Locality-sensitive hashing for finding nearest neighbors
    Slaney, Malcolm
    Casey, Michael
    IEEE SIGNAL PROCESSING MAGAZINE, 2008, 25 (02) : 128 - 131
  • [29] Non-Metric Locality-Sensitive Hashing
    Mu, Yadong
    Yan, Shuicheng
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 539 - 544
  • [30] Using Locality-Sensitive Hashing for SVM Classification of Large Data Sets
    Gonzalez-Lima, Maria D.
    Ludena, Carenne C.
    MATHEMATICS, 2022, 10 (11)