Mining discriminative items in multiple data streams

被引：6

作者：

Lin, Zhenhua ^{[1
]}

Jiang, Bin ^{[1
]}

Pei, Jian ^{[1
]}

Jiang, Daxin ^{[2
]}

机构：

[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2010年 / 13卷 / 04期

基金：

加拿大自然科学与工程研究理事会;

关键词：

data mining; data streams; discriminative items; FINDING FREQUENT;

D O I：

10.1007/s11280-010-0094-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this paper, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S (1) against stream S (2) by one scan, the space lower bound is pound is the alphabet of items and n (1) is the current size of S (1). To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to |I | pound. The complexity of all algorithms are independent to the size of the two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design.

引用

页码：497 / 522

页数：26

共 50 条

[41] Sequential pattern mining in multiple streams
Chen, G
Wu, XD
Zhu, XQ
Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 585 - 588
[42] Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams
Liu, Fang'ai
Wang, Qianqian
Wang, Xin
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 3): : S6133 - S6141
[43] Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams
Fang’ai Liu
Qianqian Wang
Xin Wang
Cluster Computing, 2019, 22 : 6133 - 6141
[44] Distributed web mining using Bayesian networks from multiple data streams
Chen, R
Sivakumar, K
Kargupta, H
2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 75 - 82
[45] Incremental mining of sequential patterns from multiple item set data streams
1600, (08):
[46] Mining serial episode rules with time lags over multiple data streams
Lee, Tung-Ying
Wang, En Tzu
Chen, Arbee L. P.
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 227 - +
[47] Estimating the Frequency of Data Items in Massive Distributed Streams
Anceaume, Emmanuelle
Busnel, Yann
Rivetti, Nicolo
2015 IEEE 4TH SYMPOSIUM ON NETWORK CLOUD COMPUTING AND APPLICATIONS - NCCA 2015, 2015, : 59 - 66
[48] Scout Sketch: Finding Promising Items in Data Streams
Ma, Tianyu
Gao, Guoju
Huang, He
Sun, Yu-E
Du, Yang
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2024, : 1561 - 1570
[49] A Probabilistic Sketch for Summarizing Cold Items of Data Streams
Liu, Yongqiang
Xie, Xike
IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (02) : 1287 - 1302
[50] Efficiently discovering recent frequent items in data streams
Tantono, Ferry Irawan
Manerikar, Nishad
Palpanas, Thernis
SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2008, 5069 : 222 - +

← 1 2 3 4 5 →