Mining discriminative items in multiple data streams

被引：6

作者：

Lin, Zhenhua ^{[1
]}

Jiang, Bin ^{[1
]}

Pei, Jian ^{[1
]}

Jiang, Daxin ^{[2
]}

机构：

[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2010年 / 13卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

data mining; data streams; discriminative items; FINDING FREQUENT;

D O I：

10.1007/s11280-010-0094-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this paper, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S (1) against stream S (2) by one scan, the space lower bound is pound is the alphabet of items and n (1) is the current size of S (1). To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to |I | pound. The complexity of all algorithms are independent to the size of the two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design.

引用

页码：497 / 522

页数：26

共 50 条

[21] Finding frequent items in data streams
Charikar, M
Chen, K
Farach-Colton, M
AUTOMATA, LANGUAGES AND PROGRAMMING, 2002, 2380 : 693 - 703
[22] Finding Significant Items in Data Streams
Yang, Tong
Zhang, Haowei
Yang, Dongsheng
Huang, Yucheng
Li, Xiaoming
2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1394 - 1405
[23] Finding Persistent Items in Data Streams
Dai, Haipeng
Shahzad, Muhammad
Liu, Alex X.
Zhong, Yuankun
PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 10 (04): : 289 - 300
[24] Finding Frequent Items in Data Streams
Cormode, Graham
Hadjieleftheriou, Marios
PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1530 - 1541
[25] Mining Frequent Items Over the Distributed Hierarchical Continuous Weighted Data Streams in Internet of Things
Zhang, Shuzhuang
Zhang, Yu
Yin, Lihua
Yuan, Tingting
Wu, Zhigang
Luo, Hao
IEEE ACCESS, 2019, 7 : 74890 - 74898
[26] A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams
Eugenio Cesario
Carlo Mastroianni
Domenico Talia
Journal of Grid Computing, 2014, 12 : 153 - 168
[27] A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams
Cesario, Eugenio
Mastroianni, Carlo
Talia, Domenico
JOURNAL OF GRID COMPUTING, 2014, 12 (01) : 153 - 168
[28] Online mining changes of items over continuous append-only and dynamic data streams
Li, HF
Lee, SY
Shan, MK
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2005, 11 (08) : 1411 - 1425
[29] PTree: Mining sequential patterns efficiently in multiple data streams environment
1600, Institute of Information Science (29):
[30] A Novel Approach for Mining Multiple Data Streams Based on Lag Correlation
Zhang, Tiancheng
Yue, Dejun
Wang, Yanqiu
Yu, Ge
2011 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, 2011, : 2377 - 2382

← 1 2 3 4 5 →