Finding top-k elements in data streams

被引:42
|
作者
Homem, Nuno [1 ]
Carvalho, Joao Paulo [1 ]
机构
[1] INESC ID, TULisbon Inst Super Tecn, P-1000029 Lisbon, Portugal
关键词
Approximate algorithms; Top-k algorithms; Most frequent; Estimation; Data stream frequencies; FREQUENT ITEMSETS;
D O I
10.1016/j.ins.2010.08.024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Identifying the most frequent elements in a data stream is a well known and difficult problem. Identifying the most frequent elements for each individual, especially in very large populations, is even harder. The use of fast and small memory footprint algorithms is paramount when the number of individuals is very large. In many situations such analysis needs to be performed and kept up to date in near real time. Fortunately, approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm that addresses this problem by merging the commonly used counter-based and sketch-based techniques for top-k identification. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value. It also provides strong guarantees on the error estimate, order of elements and inclusion of elements in the list depending on their real frequency. Additionally the algorithm provides stochastic bounds on the error and expected error estimates. Telecommunications customer's behavior and voice call data is used to present concrete results obtained with this algorithm and to illustrate improvements over previously existing algorithms. (C) 2010 Elsevier Inc. All rights reserved.
引用
收藏
页码:4958 / 4974
页数:17
相关论文
共 50 条
  • [1] Efficient computation of frequent and top-k elements in data streams
    Metwally, A
    Agrawal, D
    El Abbadi, A
    DATABASE THEORY - ICDT 2005, PROCEEDINGS, 2005, 3363 : 398 - 412
  • [2] LUSketch: A Fast and Precise Sketch for top-k Finding in Data Streams
    Lu, Jie
    Chen, Hongchang
    Zhang, Zhen
    2022 31ST INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN 2022), 2022,
  • [3] WavingSketch: an unbiased and generic sketch for finding top-k items in data streams
    Liu, Zirui
    Dong, Fenghao
    Liu, Chengwu
    Deng, Xiangwei
    Yang, Tong
    Zhao, Yikai
    Li, Jizhou
    Cui, Bin
    Zhang, Gong
    VLDB JOURNAL, 2024, 33 (05): : 1697 - 1722
  • [4] Efficiently Finding Top-K Items from Evolving Distributed Data Streams
    Qi, Baoyuan
    Ma, Gang
    Shi, Zhongzhi
    Wang, Wei
    2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 137 - 140
  • [5] WavingSketch An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams
    Li, Jizhou
    Li, Zikun
    Xu, Yifei
    Jiang, Shiqi
    Yang, Tong
    Cui, Bin
    Dai, Yafei
    Zhang, Gong
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1574 - 1584
  • [6] An integrated efficient solution for computing frequent and top-k elements in data streams
    Metwally, Ahmed
    Agrawal, Divyakant
    El Abbadi, Amr
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2006, 31 (03): : 1095 - 1133
  • [7] SSS: An Accurate and Fast Algorithm for Finding Top-k Hot Items in Data Streams
    Gong, Junzhi
    Tian, Deyu
    Yang, Dongsheng
    Yang, Tong
    Dai, Tuo
    Cui, Bin
    Li, Xiaoming
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 106 - 113
  • [8] Comments on "An Integrated Efficient Solution for Computing Frequent and Top-k Elements in Data Streams"
    Liu, Hongyan
    Wang, Xiaoyu
    Yang, Yinghui
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2010, 35 (02):
  • [9] Top-k Correlated Subgraph Query for Data Streams
    Pan, Shirui
    Zhu, Xingquan
    Fang, Meng
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2906 - 2909
  • [10] Finding top-k elements in a time-sliding window
    Homem N.
    Carvalho J.P.
    Evolving Systems, 2011, 2 (01) : 51 - 70