A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams

被引:8
|
作者
Cesario, Eugenio [1 ]
Mastroianni, Carlo [1 ]
Talia, Domenico [2 ,3 ]
机构
[1] ICAR CNR, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, ICAR CNR, I-87036 Arcavacata Di Rende, CS, Italy
[3] Univ Calabria, DIMES, I-87036 Arcavacata Di Rende, CS, Italy
关键词
Distributed data mining; Frequent items; Frequent itemsets; Grid; Stream mining;
D O I
10.1007/s10723-013-9277-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Real-time analysis of distributed data streams is a challenging task since it requires scalable solutions to handle streams of data that are generated very rapidly by multiple sources. This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and itemsets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent itemsets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of three domains each one handling a data stream.
引用
收藏
页码:153 / 168
页数:16
相关论文
共 50 条
  • [41] Interactive mining of top-K frequent closed itemsets from data streams
    Li, Hua-Fu
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (07) : 10779 - 10788
  • [42] A Mining Maximal Frequent Itemsets over the Entire History of Data Streams
    Mao, Yinmin
    Li, Hong
    Yang, Lumin
    Chen, Zhigang
    Liu, Lixin
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 413 - 417
  • [43] Online mining (recently) maximal frequent itemsets over data streams
    Li, HF
    Lee, SY
    Shan, MK
    15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, Proceedings, 2005, : 11 - 18
  • [44] Mining maximal frequent itemsets in a sliding window over data streams
    Mao Y.
    Li H.
    Yang L.
    Liu L.
    Gaojishu Tongxin/Chinese High Technology Letters, 2010, 20 (11): : 1142 - 1148
  • [45] An Efficient Frequent Closed Itemsets Mining Algorithm Over Data Streams
    Tan, Jun
    Yu, Shao-jun
    2011 SECOND INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATION AND EDUCATION APPLICATION (ICEA 2011), 2011, : 197 - 201
  • [46] Mining recent frequent itemsets in sliding windows over data streams
    Congying Han
    Lijun Xu
    Guoping He
    COMPUTING AND INFORMATICS, 2008, 27 (03) : 315 - 339
  • [47] Mining recent frequent itemsets in data streams by radioactively attenuating strategy
    Jia, LF
    Wang, Z
    Zhou, CG
    Xu, XJ
    ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2005, 3584 : 804 - 811
  • [48] An Efficient Frequent Closed Itemsets Mining Algorithm Over Data Streams
    Tan, Jun
    Bu, Yingyong
    Yang, Bo
    2009 INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT, INNOVATION MANAGEMENT AND INDUSTRIAL ENGINEERING, VOL 3, PROCEEDINGS, 2009, : 65 - +
  • [49] An efficient algorithm for mining maximal frequent itemsets over data streams
    Mao Y.
    Yang L.
    Li H.
    Chen Z.
    Liu L.
    Gaojishu Tongxin/Chinese High Technology Letters, 2010, 20 (03): : 246 - 252
  • [50] Efficient Data Streams Based Closed Frequent Itemsets Mining Algorithm
    Tan, Jun
    ADVANCES IN CIVIL ENGINEERING II, PTS 1-4, 2013, 256-259 : 2910 - 2913