Finding Persistent Items in Data Streams

被引：2

作者：

Dai, Haipeng ^{[1
]}

Shahzad, Muhammad ^{[2
]}

Liu, Alex X. ^{[1
]}

Zhong, Yuankun ^{[1
]}

机构：

[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Jiangsu, Peoples R China

[2] North Carolina State Univ, Dept Comp Sci, Raleigh, NC 27695 USA

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2016年 / 10卷 / 04期

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Frequent item mining, which deals with finding items that occur frequently in a given data stream over a period of time, is one of the heavily studied problems in data stream mining. A generalized version of frequent item mining is the persistent item mining, where a persistent item, unlike a frequent item, does not necessarily occur more frequently compared to other items over a short period of time, rather persists and occurs more frequently over a long period of time. To the best of our knowledge, there is no prior work on mining persistent items in a data stream. In this paper, we address the fundamental problem of finding persistent items in a given data stream during a given period of time at any given observation point. We propose a novel scheme, PIE, that can accurately identify each persistent item with a probability greater than any desired false negative rate (FNR) while using a very small amount of memory. The key idea of PIE is that it uses Raptor codes to encode the ID of each item that appears at the observation point during a measurement period and stores only a few bits of the encoded ID in the memory of that observation point during that measurement period. The item that is persistent occurs in enough measurement periods that enough encoded bits for the ID can be retrieved from the observation point to decode them correctly and get the ID of the persistent item. We implemented and extensively evaluated PIE using three real network traffic traces and compared its performance with two prior adapted schemes. Our results show that not only PIE achieves the desired FNR in every scenario, its FNR, on average, is 19.5 times smaller than the FNR of the best adapted prior art.

引用

页码：289 / 300

页数：12

共 50 条

[21] Monitoring persistent items in the union of distributed streams
Singh, Sneha Aman
Tirthapura, Srikanta
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2014, 74 (11) : 3115 - 3127
[22] Finding needles in a hay stream: On persistent item lookup in data streams
Chen, Lin
Dai, Haipeng
Meng, Lei
Yu, Jihong
COMPUTER NETWORKS, 2020, 181
[23] WavingSketch An Unbiased and Generic Sketch for Finding Top-k Items in Data Streams
Li, Jizhou
Li, Zikun
Xu, Yifei
Jiang, Shiqi
Yang, Tong
Cui, Bin
Dai, Yafei
Zhang, Gong
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 1574 - 1584
[24] Scout Sketch plus : Finding Both Promising and Damping Items Simultaneously in Data Streams
Gao, Guoju
Ma, Tianyu
Huang, He
Sun, Yu-E
Wang, Haibo
Du, Yang
Chen, Shigang
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (06) : 5491 - 5506
[25] WavingSketch: an unbiased and generic sketch for finding top-k items in data streams
Liu, Zirui
Dong, Fenghao
Liu, Chengwu
Deng, Xiangwei
Yang, Tong
Zhao, Yikai
Li, Jizhou
Cui, Bin
Zhang, Gong
VLDB JOURNAL, 2024, 33 (05): : 1697 - 1722
[26] Efficiently Finding Top-K Items from Evolving Distributed Data Streams
Qi, Baoyuan
Ma, Gang
Shi, Zhongzhi
Wang, Wei
2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 137 - 140
[27] Finding frequent items over general update streams
Ganguly, Sumit
Singh, Abhayendra N.
Shankar, Satyam
SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 204 - +
[28] SSS: An Accurate and Fast Algorithm for Finding Top-k Hot Items in Data Streams
Gong, Junzhi
Tian, Deyu
Yang, Dongsheng
Yang, Tong
Dai, Tuo
Cui, Bin
Li, Xiaoming
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 106 - 113
[29] Discovering correlated items in data streams
Sun, Xingzhi
Chang, Ming
Li, Xue
Orlowska, Maria E.
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2007, 4426 : 260 - +
[30] Mining Robust Frequent Items in Data Streams
Xia, Rui
Dai, Haipeng
Du, Zhanchao
Li, Meng
Liu, Alex X.
Chen, Guihai
2020 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2020), 2020, : 110 - 117

← 1 2 3 4 5 →