Finding Persistent Items in Data Streams

被引:2
|
作者
Dai, Haipeng [1 ]
Shahzad, Muhammad [2 ]
Liu, Alex X. [1 ]
Zhong, Yuankun [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[2] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 10卷 / 04期
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.
引用
收藏
页码:289 / 300
页数:12
相关论文
共 50 条
  • [41] Finding frequent items in a turnstile data stream
    Hung, Regant Y. S.
    Lai, Kwok Fai
    Ting, Hing Fung
    COMPUTING AND COMBINATORICS, PROCEEDINGS, 2008, 5092 : 498 - 509
  • [42] Estimating the Frequency of Data Items in Massive Distributed Streams
    Anceaume, Emmanuelle
    Busnel, Yann
    Rivetti, Nicolo
    2015 IEEE 4TH SYMPOSIUM ON NETWORK CLOUD COMPUTING AND APPLICATIONS - NCCA 2015, 2015, : 59 - 66
  • [43] A Probabilistic Sketch for Summarizing Cold Items of Data Streams
    Liu, Yongqiang
    Xie, Xike
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (02) : 1287 - 1302
  • [44] Methods for mining frequent items in data streams: an overview
    Hongyan Liu
    Yuan Lin
    Jiawei Han
    Knowledge and Information Systems, 2011, 26 : 1 - 30
  • [45] Efficiently discovering recent frequent items in data streams
    Tantono, Ferry Irawan
    Manerikar, Nishad
    Palpanas, Thernis
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 222 - +
  • [46] HeavyGuardian: Separate and Guard Hot Items in Data Streams
    Yang, Tong
    Gong, Junzhi
    Zhang, Haowei
    Zou, Lei
    Shi, Lei
    Li, Xiaoming
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2584 - 2593
  • [47] Filtering duplicate items over distributed data streams
    Xia, T
    Jin, CQ
    Zhou, XF
    Zhou, AY
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 779 - 784
  • [48] Processing frequent items over distributed data streams
    Zhang, DD
    Li, JZ
    Wang, WP
    Guo, LJ
    Ai, CY
    WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 523 - 529
  • [49] Methods for mining frequent items in data streams: an overview
    Liu, Hongyan
    Lin, Yuan
    Han, Jiawei
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 26 (01) : 1 - 30
  • [50] Finding Heavy Distinct Hitters in Data Streams
    Locher, Thomas
    SPAA 11: PROCEEDINGS OF THE TWENTY-THIRD ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2011, : 299 - 308