Finding top-k elements in data streams

被引：42

作者：

Homem, Nuno ^{[1
]}

Carvalho, Joao Paulo ^{[1
]}

机构：

[1] INESC ID, TULisbon Inst Super Tecn, P-1000029 Lisbon, Portugal

来源：

INFORMATION SCIENCES | 2010年 / 180卷 / 24期

关键词：

Approximate algorithms; Top-k algorithms; Most frequent; Estimation; Data stream frequencies; FREQUENT ITEMSETS;

D O I：

10.1016/j.ins.2010.08.024

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Identifying the most frequent elements in a data stream is a well known and difficult problem. Identifying the most frequent elements for each individual, especially in very large populations, is even harder. The use of fast and small memory footprint algorithms is paramount when the number of individuals is very large. In many situations such analysis needs to be performed and kept up to date in near real time. Fortunately, approximate answers are usually adequate when dealing with this problem. This paper presents a new and innovative algorithm that addresses this problem by merging the commonly used counter-based and sketch-based techniques for top-k identification. The algorithm provides the top-k list of elements, their frequency and an error estimate for each frequency value. It also provides strong guarantees on the error estimate, order of elements and inclusion of elements in the list depending on their real frequency. Additionally the algorithm provides stochastic bounds on the error and expected error estimates. Telecommunications customer's behavior and voice call data is used to present concrete results obtained with this algorithm and to illustrate improvements over previously existing algorithms. (C) 2010 Elsevier Inc. All rights reserved.

引用

页码：4958 / 4974

页数：17

共 50 条

[31] Mining top-k frequent patterns over data streams sliding window
Chen, Hui
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2014, 42 (01) : 111 - 131
[32] Finding top-k longest palindromes in substrings
Mitani, Kazuki
Mieno, Takuya
Seto, Kazuhisa
Horiyama, Takashi
THEORETICAL COMPUTER SCIENCE, 2023, 979
[33] Finding skyline and top-k bargaining solutions
Soliman, Mohamed A.
Ilyas, Ihab F.
Koudas, Nick
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2007, : 1238 - +
[34] Finding Top-k Optimal Sequenced Routes
Liu, Huiping
Jin, Cheqing
Yang, Bin
Zhou, Aoying
2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 569 - 580
[35] Effective and efficient top-k query processing over incomplete data streams
Ren, Weilong
Lian, Xiang
Ghazinour, Kambiz
INFORMATION SCIENCES, 2021, 544 : 343 - 371
[36] LotterySampling: A Randomized Algorithm for the Heavy Hitters and Top-k Problems in Data Streams
Martinez, Conrado
Solera-Pardo, Gonzalo
COMPUTING AND COMBINATORICS, COCOON 2022, 2022, 13595 : 24 - 35
[37] Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Hua, Ming
Pei, Jian
DISTRIBUTED AND PARALLEL DATABASES, 2009, 26 (01) : 29 - 65
[38] Continuous Monitoring of Top-k Dominating Queries over Uncertain Data Streams
Li, Guohui
Luo, Changyin
Li, Jianjun
WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 244 - 255
[39] Using Bloom Filters for Mining Top-k Frequent Itemsets in Data Streams
Kim, Younghee
Cho, Kyungsoo
Yoon, Jaeyeol
Kim, Ieejoon
Kim, Ungmo
SECURE AND TRUST COMPUTING, DATA MANAGEMENT, AND APPLICATIONS, 2011, 186 : 209 - 216
[40] Continuously monitoring top-k uncertain data streams: a probabilistic threshold method
Ming Hua
Jian Pei
Distributed and Parallel Databases, 2009, 26 : 29 - 65

← 1 2 3 4 5 →