Methods for finding frequent items in data streams

被引:0
|
作者
Graham Cormode
Marios Hadjieleftheriou
机构
[1] AT&T Labs–Research,
来源
The VLDB Journal | 2010年 / 19卷
关键词
Data Stream; Hash Function; Frequent Item; Average Relative Error; Heavy Hitter;
D O I
暂无
中图分类号
学科分类号
摘要
The frequent items problem is to process a stream of items and find all items occurring more than a given fraction of the time. It is one of the most heavily studied problems in data stream mining, dating back to the 1980s. Many applications rely directly or indirectly on finding the frequent items, and implementations are in use in large scale industrial systems. However, there has not been much comparison of the different methods under uniform experimental conditions. It is common to find papers touching on this topic in which important related work is mischaracterized, overlooked, or reinvented. In this paper, we aim to present the most important algorithms for this problem in a common framework. We have created baseline implementations of the algorithms and used these to perform a thorough experimental study of their properties. We give empirical evidence that there is considerable variation in the performance of frequent items algorithms. The best methods can be implemented to find frequent items with high accuracy using only tens of kilobytes of memory, at rates of millions of items per second on cheap modern hardware.
引用
收藏
页码:3 / 20
页数:17
相关论文
共 50 条
  • [21] Finding frequent items in a turnstile data stream
    Hung, Regant Y. S.
    Lai, Kwok Fai
    Ting, Hing Fung
    [J]. COMPUTING AND COMBINATORICS, PROCEEDINGS, 2008, 5092 : 498 - 509
  • [22] PeriodicSketch: Finding Periodic Items in Data Streams
    Fan, Zhuochen
    Zhang, Yinda
    Yang, Tong
    Yan, Mingyi
    Wen, Gang
    Wu, Yuhan
    Li, Hongze
    Cui, Bin
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 96 - 109
  • [23] Efficiently discovering recent frequent items in data streams
    Tantono, Ferry Irawan
    Manerikar, Nishad
    Palpanas, Thernis
    [J]. SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 222 - +
  • [24] Processing frequent items over distributed data streams
    Zhang, DD
    Li, JZ
    Wang, WP
    Guo, LJ
    Ai, CY
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 523 - 529
  • [25] FIDS: Monitoring frequent items over distributed data streams
    Fuller, Robert
    Kantardzic, Mehmed
    [J]. MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 464 - +
  • [26] Find recent frequent items with sliding windows in data streams
    Ren, Jiadong
    Li, Ke
    [J]. 2007 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, VOL II, PROCEEDINGS, 2007, : 625 - 628
  • [27] Finding frequent itemsets over online data streams
    Chang, Joong Hyuk
    Lee, Won Suk
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2006, 48 (07) : 606 - 618
  • [28] Finding frequent items in parallel
    Cafaro, Massimo
    Tempesta, Piergiulio
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2011, 23 (15): : 1774 - 1788
  • [29] A Mining Algorithm of Frequent Items in Data Streams Based on Apache Storm
    Hu, Weihua
    Guo, Ziang
    Chen, Mingzhong
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2926 - 2930
  • [30] A High-Performance Algorithm for Identifying Frequent Items in Data Streams
    Anderson, Daniel
    Bevan, Pryce
    Lang, Kevin
    Liberty, Edo
    Rhodes, Lee
    Thaler, Justin
    [J]. PROCEEDINGS OF THE 2017 INTERNET MEASUREMENT CONFERENCE (IMC'17), 2017, : 268 - 282