A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams

被引:8
|
作者
Cesario, Eugenio [1 ]
Mastroianni, Carlo [1 ]
Talia, Domenico [2 ,3 ]
机构
[1] ICAR CNR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, ICAR CNR, I-87036 Arcavacata Di Rende, CS, Italy
[3] Univ Calabria, DIMES, I-87036 Arcavacata Di Rende, CS, Italy
关键词
Distributed data mining; Frequent items; Frequent itemsets; Grid; Stream mining;
D O I
10.1007/s10723-013-9277-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time analysis of distributed data streams is a challenging task since it requires scalable solutions to handle streams of data that are generated very rapidly by multiple sources. This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and itemsets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent itemsets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of three domains each one handling a data stream.
引用
收藏
页码:153 / 168
页数:16
相关论文
共 50 条
  • [21] Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis
    En Tzu Wang
    Arbee L. P. Chen
    Data Mining and Knowledge Discovery, 2011, 23 : 252 - 299
  • [22] Mining frequent itemsets over distributed data streams by continuously maintaining a global synopsis
    Wang, En Tzu
    Chen, Arbee L. P.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2011, 23 (02) : 252 - 299
  • [23] Mining Robust Frequent Items in Data Streams
    Xia, Rui
    Dai, Haipeng
    Du, Zhanchao
    Li, Meng
    Liu, Alex X.
    Chen, Guihai
    2020 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2020), 2020, : 110 - 117
  • [24] A survey on algorithms for mining frequent itemsets over data streams
    Cheng, James
    Ke, Yiping
    Ng, Wilfred
    KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) : 1 - 27
  • [25] A Novel Strategy for Mining Frequent Closed Itemsets in Data Streams
    Tang, Keming
    Dai, Caiyan
    Chen, Ling
    JOURNAL OF COMPUTERS, 2012, 7 (07) : 1564 - 1573
  • [26] Mining frequent itemsets in data streams within a time horizon
    Troiano, Luigi
    Scibelli, Giacomo
    DATA & KNOWLEDGE ENGINEERING, 2014, 89 : 21 - 37
  • [27] Mining of Probabilistic Frequent Itemsets over Uncertain Data Streams
    Liu Lixin
    Zhang Xiaolin
    Zhang Huanxiang
    2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, : 231 - 237
  • [28] Frequent Itemsets Mining in Data Streams Using Reconfigurable Hardware
    Bustio, Lazaro
    Cumplido, Rene
    Hernandez, Raudel
    Bande, Jose M.
    Feregrino, Claudia
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, 2016, 9607 : 32 - 45
  • [29] Efficient mining algorithm of frequent itemsets for uncertain data streams
    Wang Qianqian
    Liu Fang-ai
    PROCEEDINGS OF 2016 9TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2016, : 443 - 446
  • [30] A survey on algorithms for mining frequent itemsets over data streams
    James Cheng
    Yiping Ke
    Wilfred Ng
    Knowledge and Information Systems, 2008, 16 : 1 - 27