Finding frequent items in data streams

被引:0
|
作者
Charikar, M [1 ]
Chen, K
Farach-Colton, M
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] Rutgers State Univ, Piscataway, NJ 08855 USA
来源
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We present a 1-pass algorithm for estimating the most frequent items in a data stream using very limited storage space. Our method relies on a novel data structure called a COUNT SKETCH, which allows us to estimate the frequencies of all the items in the stream. Our algorithm achieves better space bounds than the previous best known algorithms for this problem for many natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this problem has not been previously studied in the literature.
引用
收藏
页码:693 / 703
页数:11
相关论文
共 50 条
  • [1] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 312 (01) : 3 - 15
  • [2] Finding the Frequent Items in Streams of Data
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 97 - 105
  • [3] Finding Frequent Items in Data Streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1530 - 1541
  • [4] Methods for finding frequent items in data streams
    Graham Cormode
    Marios Hadjieleftheriou
    [J]. The VLDB Journal, 2010, 19 : 3 - 20
  • [5] Finding hierarchical frequent items in data streams
    Feng, Wenfeng
    Guo, Qiao
    Zhang, Zhibin
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5972 - +
  • [6] Methods for finding frequent items in data streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. VLDB JOURNAL, 2010, 19 (01): : 3 - 20
  • [7] Finding frequent items in data streams using ESBF
    Wang, ShuYun
    Hao, XiuLan
    Xu, HeXiang
    Hu, Yunfa
    [J]. EMERGING TECHNOLOGIES IN KNOWLEDGE DISCOVERY AND DATA MINING, 2007, 4819 : 244 - +
  • [8] Finding (recently) frequent items in distributed data streams
    Manjhi, A
    Shkapenyuk, V
    Dhamdhere, K
    Olston, C
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 767 - 778
  • [9] Finding frequent items of data streams based on hierarchical sketch
    Network Information Center, Beijing Institute of Technology, Beijing 100081, China
    [J]. Beijing Ligong Daxue Xuebao, 2006, 6 (512-516):
  • [10] Finding Recently Frequent Items over Online Data Streams
    尹志武
    黄上腾
    [J]. Journal of Donghua University(English Edition), 2006, (06) : 53 - 56