Methods for finding frequent items in data streams

被引:0
|
作者
Graham Cormode
Marios Hadjieleftheriou
机构
[1] AT&T Labs–Research,
来源
The VLDB Journal | 2010年 / 19卷
关键词
Data Stream; Hash Function; Frequent Item; Average Relative Error; Heavy Hitter;
D O I
暂无
中图分类号
学科分类号
摘要
The frequent items problem is to process a stream of items and find all items occurring more than a given fraction of the time. It is one of the most heavily studied problems in data stream mining, dating back to the 1980s. Many applications rely directly or indirectly on finding the frequent items, and implementations are in use in large scale industrial systems. However, there has not been much comparison of the different methods under uniform experimental conditions. It is common to find papers touching on this topic in which important related work is mischaracterized, overlooked, or reinvented. In this paper, we aim to present the most important algorithms for this problem in a common framework. We have created baseline implementations of the algorithms and used these to perform a thorough experimental study of their properties. We give empirical evidence that there is considerable variation in the performance of frequent items algorithms. The best methods can be implemented to find frequent items with high accuracy using only tens of kilobytes of memory, at rates of millions of items per second on cheap modern hardware.
引用
收藏
页码:3 / 20
页数:17
相关论文
共 50 条
  • [1] Methods for finding frequent items in data streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. VLDB JOURNAL, 2010, 19 (01): : 3 - 20
  • [2] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 312 (01) : 3 - 15
  • [3] Finding the Frequent Items in Streams of Data
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 97 - 105
  • [4] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    [J]. AUTOMATA, LANGUAGES AND PROGRAMMING, 2002, 2380 : 693 - 703
  • [5] Finding Frequent Items in Data Streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1530 - 1541
  • [6] Finding hierarchical frequent items in data streams
    Feng, Wenfeng
    Guo, Qiao
    Zhang, Zhibin
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5972 - +
  • [7] Finding frequent items in data streams using ESBF
    Wang, ShuYun
    Hao, XiuLan
    Xu, HeXiang
    Hu, Yunfa
    [J]. EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 244 - +
  • [8] Finding (recently) frequent items in distributed data streams
    Manjhi, A
    Shkapenyuk, V
    Dhamdhere, K
    Olston, C
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 767 - 778
  • [9] Finding frequent items of data streams based on hierarchical sketch
    Network Information Center, Beijing Institute of Technology, Beijing 100081, China
    [J]. Beijing Ligong Daxue Xuebao, 2006, 6 (512-516):
  • [10] Finding Recently Frequent Items over Online Data Streams
    尹志武
    黄上腾
    [J]. Journal of Donghua University(English Edition), 2006, (06) : 53 - 56