Finding Persistent Items in Data Streams

被引:2
|
作者
Dai, Haipeng [1 ]
Shahzad, Muhammad [2 ]
Liu, Alex X. [1 ]
Zhong, Yuankun [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[2] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 10卷 / 04期
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.
引用
收藏
页码:289 / 300
页数:12
相关论文
共 50 条
  • [1] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    THEORETICAL COMPUTER SCIENCE, 2004, 312 (01) : 3 - 15
  • [2] Finding the Frequent Items in Streams of Data
    Cormode, Graham
    Hadjieleftheriou, Marios
    COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 97 - 105
  • [3] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    AUTOMATA, LANGUAGES AND PROGRAMMING, 2002, 2380 : 693 - 703
  • [4] Finding Significant Items in Data Streams
    Yang, Tong
    Zhang, Haowei
    Yang, Dongsheng
    Huang, Yucheng
    Li, Xiaoming
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1394 - 1405
  • [5] Finding Frequent Items in Data Streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1530 - 1541
  • [6] Methods for finding frequent items in data streams
    Graham Cormode
    Marios Hadjieleftheriou
    The VLDB Journal, 2010, 19 : 3 - 20
  • [7] Methods for finding frequent items in data streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    VLDB JOURNAL, 2010, 19 (01): : 3 - 20
  • [8] Finding hierarchical frequent items in data streams
    Feng, Wenfeng
    Guo, Qiao
    Zhang, Zhibin
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5972 - +
  • [9] PeriodicSketch: Finding Periodic Items in Data Streams
    Fan, Zhuochen
    Zhang, Yinda
    Yang, Tong
    Yan, Mingyi
    Wen, Gang
    Wu, Yuhan
    Li, Hongze
    Cui, Bin
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 96 - 109
  • [10] Identifying and Estimating Persistent Items in Data Streams
    Dai, Haipeng
    Shahzad, Muhammad
    Liu, Alex X.
    Li, Meng
    Zhong, Yuankun
    Chen, Guihai
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2018, 26 (06) : 2429 - 2442