Mining discriminative items in multiple data streams

被引:6
|
作者
Lin, Zhenhua [1 ]
Jiang, Bin [1 ]
Pei, Jian [1 ]
Jiang, Daxin [2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Microsoft Res Asia, Beijing, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
data mining; data streams; discriminative items; FINDING FREQUENT;
D O I
10.1007/s11280-010-0094-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this paper, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S (1) against stream S (2) by one scan, the space lower bound is pound is the alphabet of items and n (1) is the current size of S (1). To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to |I | pound. The complexity of all algorithms are independent to the size of the two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design.
引用
收藏
页码:497 / 522
页数:26
相关论文
共 50 条
  • [1] Mining discriminative items in multiple data streams
    Zhenhua Lin
    Bin Jiang
    Jian Pei
    Daxin Jiang
    World Wide Web, 2010, 13 : 497 - 522
  • [2] Mining Discriminative Itemsets in Data Streams
    Seyfi, Majid
    Geva, Shlomo
    Nayak, Richi
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2014, PT I, 2014, 8786 : 125 - 134
  • [3] Mining discriminative itemsets in data streams
    Seyfi, Majid (m.seyfi@qut.edu.au), 1600, Springer Verlag (8786):
  • [4] Mining Robust Frequent Items in Data Streams
    Xia, Rui
    Dai, Haipeng
    Du, Zhanchao
    Li, Meng
    Liu, Alex X.
    Chen, Guihai
    2020 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2020), 2020, : 110 - 117
  • [5] Mining noisy data streams via a discriminative model
    Chu, F
    Wang, YZ
    Zaniolo, C
    DISCOVERY SCIENCE, PROCEEDINGS, 2004, 3245 : 47 - 59
  • [6] Methods for mining frequent items in data streams: an overview
    Hongyan Liu
    Yuan Lin
    Jiawei Han
    Knowledge and Information Systems, 2011, 26 : 1 - 30
  • [7] Methods for mining frequent items in data streams: an overview
    Liu, Hongyan
    Lin, Yuan
    Han, Jiawei
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 26 (01) : 1 - 30
  • [8] A Mining Algorithm of Frequent Items in Data Streams Based on Apache Storm
    Hu, Weihua
    Guo, Ziang
    Chen, Mingzhong
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2926 - 2930
  • [9] Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window
    Seyfi M.
    Nayak R.
    Xu Y.
    SN Computer Science, 4 (5)
  • [10] False-negative frequent items mining from data streams with bursting
    Chong, ZH
    Yu, JX
    Lu, HJ
    Zhang, ZJ
    Zhou, AY
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 422 - 434