Mining discriminative items in multiple data streams

被引:6
|
作者
Lin, Zhenhua [1 ]
Jiang, Bin [1 ]
Pei, Jian [1 ]
Jiang, Daxin [2 ]
机构
[1] Simon Fraser Univ, Burnaby, BC V5A 1S6, Canada
[2] Microsoft Res Asia, Beijing, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
data mining; data streams; discriminative items; FINDING FREQUENT;
D O I
10.1007/s11280-010-0094-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How can we maintain a dynamic profile capturing a user's reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in North America? What are the keywords (or tags) that are 1,000 times more frequent in the blog stream on computer games than in the blog stream on Hollywood movies? To answer such interesting questions, we need to find discriminative items in multiple data streams. Each data source, such as Web search queries in a region and blog postings on a topic, can be modeled as a data stream due to the fast growing volume of the source. Motivated by the extensive applications, in this paper, we study the problem of mining discriminative items in multiple data streams. We show that, to exactly find all discriminative items in stream S (1) against stream S (2) by one scan, the space lower bound is pound is the alphabet of items and n (1) is the current size of S (1). To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and sub-linear processing time per item with respect to |I | pound. The complexity of all algorithms are independent to the size of the two streams. An extensive empirical study using both real data sets and synthetic data sets verifies our design.
引用
收藏
页码:497 / 522
页数:26
相关论文
共 50 条
  • [31] PTree: Mining Sequential Patterns Efficiently in Multiple Data Streams Environment
    Lee, Guanling
    Chen, Yi-Chun
    Hung, Kuo-Che
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2013, 29 (06) : 1151 - 1169
  • [32] Methods for finding frequent items in data streams
    Graham Cormode
    Marios Hadjieleftheriou
    The VLDB Journal, 2010, 19 : 3 - 20
  • [33] Identifying and Estimating Persistent Items in Data Streams
    Dai, Haipeng
    Shahzad, Muhammad
    Liu, Alex X.
    Li, Meng
    Zhong, Yuankun
    Chen, Guihai
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2018, 26 (06) : 2429 - 2442
  • [34] Methods for finding frequent items in data streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    VLDB JOURNAL, 2010, 19 (01): : 3 - 20
  • [35] Finding hierarchical frequent items in data streams
    Feng, Wenfeng
    Guo, Qiao
    Zhang, Zhibin
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5972 - +
  • [36] PeriodicSketch: Finding Periodic Items in Data Streams
    Fan, Zhuochen
    Zhang, Yinda
    Yang, Tong
    Yan, Mingyi
    Wen, Gang
    Wu, Yuhan
    Li, Hongze
    Cui, Bin
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 96 - 109
  • [37] Active mining of data streams
    Fan, W
    Huang, YA
    Wang, HX
    Yu, PS
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 457 - 461
  • [38] Mining databases and data streams
    Zaniolo, Carlo
    Thakkar, Hetal
    HOMELAND SECURITY TECHNOLOGY CHALLENGES: FROM SENSING AND ENCRYPTING TO MINING AND MODELING, 2008, : 103 - +
  • [39] Mining data streams: A review
    Gaber, MM
    Zaslavsky, A
    Krishnaswamy, S
    SIGMOD RECORD, 2005, 34 (02) : 18 - 26
  • [40] Discriminative mining of gene microarray data
    Lu, JP
    Wang, Y
    Wang, ZY
    Xuan, JH
    Kung, SY
    Gu, ZP
    Clarke, R
    NEURAL NETWORKS FOR SIGNAL PROCESSING XI, 2001, : 23 - 32