Finding Persistent Items in Data Streams

被引:2
|
作者
Dai, Haipeng [1 ]
Shahzad, Muhammad [2 ]
Liu, Alex X. [1 ]
Zhong, Yuankun [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China
[2] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 10卷 / 04期
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.
引用
收藏
页码:289 / 300
页数:12
相关论文
共 50 条
  • [31] Mining discriminative items in multiple data streams
    Zhenhua Lin
    Bin Jiang
    Jian Pei
    Daxin Jiang
    World Wide Web, 2010, 13 : 497 - 522
  • [32] Mining discriminative items in multiple data streams
    Lin, Zhenhua
    Jiang, Bin
    Pei, Jian
    Jiang, Daxin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2010, 13 (04): : 497 - 522
  • [33] Finding Persistent Items using Invertible Bloom Lookup Table
    Lv, Zhoudan
    He, Feng
    Chen, Lin
    PROCEEDING OF THE 2019 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2019), 2019, : 101 - 105
  • [34] BurstSketch: Finding Bursts in Data Streams
    Miao, Ruijie
    Zhong, Zheng
    Guo, Jiarui
    Li, Zikun
    Yang, Tong
    Cui, Bin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11126 - 11140
  • [35] Finding closed itemsets in data streams
    Wang, H
    Li, WY
    Li, ZZ
    Fan, L
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 2, PROCEEDINGS, 2005, 3682 : 964 - 971
  • [36] BurstSketch: Finding Bursts in Data Streams
    Zhong, Zheng
    Yan, Shen
    Li, Zikun
    Tan, Decheng
    Yang, Tong
    Cui, Bin
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 2375 - 2383
  • [37] Finding graph matchings in data streams
    McGregor, A
    APPROXIMATION, RANDOMIZATION AND COMBINATORIAL OPTIMIZATION: ALGORITHMS AND TECHNIQUES, 2005, 3624 : 170 - 181
  • [38] Finding the hottest item in data streams
    Lin, Huaizhong
    Wu, Shanshan
    Hou, Leong U.
    Kou, Ngai Meng
    Gao, Yunjun
    Lu, Dongming
    INFORMATION SCIENCES, 2018, 430 : 314 - 330
  • [39] Finding items cannibalization and synergy by BWS data
    Lipovetsky, Stan
    Conklin, Michael
    JOURNAL OF CHOICE MODELLING, 2014, 12 : 1 - 9
  • [40] Finding frequent items over data stream
    Tu, Li
    Chen, Ling
    Journal of Computational Information Systems, 2010, 6 (12): : 4127 - 4134