A Multi-Domain Architecture for Mining Frequent Items and Itemsets from Distributed Data Streams

被引:0
|
作者
Eugenio Cesario
Carlo Mastroianni
Domenico Talia
机构
[1] ICAR-CNR,ICAR
[2] University of Calabria,CNR and DIMES
来源
Journal of Grid Computing | 2014年 / 12卷
关键词
Distributed data mining; Frequent items; Frequent itemsets; Grid; Stream mining;
D O I
暂无
中图分类号
学科分类号
摘要
Real-time analysis of distributed data streams is a challenging task since it requires scalable solutions to handle streams of data that are generated very rapidly by multiple sources. This paper presents the design and the implementation of an architecture for the analysis of data streams in distributed environments. In particular, data stream analysis has been carried out for the computation of items and itemsets that exceed a frequency threshold. The mining approach is hybrid, that is, frequent items are calculated with a single pass, using a sketch algorithm, while frequent itemsets are calculated by a further multi-pass analysis. The architecture combines parallel and distributed processing to keep the pace with the rate of distributed data streams. In order to keep computation close to data, miners are distributed among the domains where data streams are generated. The paper reports the experimental results obtained with a prototype of the architecture, tested on a Grid composed of three domains each one handling a data stream.
引用
收藏
页码:153 / 168
页数:15
相关论文
共 50 条
  • [31] An Efficient Algorithm for Mining Closed Frequent Itemsets in Data Streams
    Ao, Fujiang
    Du, Jing
    Yan, Yuejin
    Liu, Baohong
    Huang, Kedi
    [J]. 8TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY WORKSHOPS: CIT WORKSHOPS 2008, PROCEEDINGS, 2008, : 37 - +
  • [32] Mining Frequent Itemsets with Normalized Weight in Continuous Data Streams
    Kim, Younghee
    Kim, Wonyoung
    Kim, Ungmo
    [J]. JOURNAL OF INFORMATION PROCESSING SYSTEMS, 2010, 6 (01): : 79 - 90
  • [33] Uncertain Frequent Itemsets Mining Algorithm on Data Streams with Constraints
    Yu, Qun
    Tang, Ke-Ming
    Tang, Shi-Xi
    Lv, Xin
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 192 - 201
  • [34] Mining Frequent Itemsets in Data Streams Based on Genetic Algorithm
    Han, Chong
    Sun, Lijuan
    Guo, Jian
    Chen, Xiaodong
    [J]. 2013 15TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT), 2013, : 748 - 753
  • [35] Methods for mining frequent items in data streams: an overview
    Hongyan Liu
    Yuan Lin
    Jiawei Han
    [J]. Knowledge and Information Systems, 2011, 26 : 1 - 30
  • [36] Methods for mining frequent items in data streams: an overview
    Liu, Hongyan
    Lin, Yuan
    Han, Jiawei
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 26 (01) : 1 - 30
  • [37] Processing frequent items over distributed data streams
    Zhang, DD
    Li, JZ
    Wang, WP
    Guo, LJ
    Ai, CY
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 523 - 529
  • [38] Finding (recently) frequent items in distributed data streams
    Manjhi, A
    Shkapenyuk, V
    Dhamdhere, K
    Olston, C
    [J]. ICDE 2005: 21ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2005, : 767 - 778
  • [39] Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window
    Lin, Chih-Hsiang
    Chiu, Ding-Ying
    Wu, Yi-Hung
    Chen, Arbee L. P.
    [J]. PROCEEDINGS OF THE FIFTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2005, : 68 - 79
  • [40] Mining frequent closed itemsets from a landmark window over online data streams
    Liu, Xuejun
    Guan, Jihong
    Hu, Ping
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 57 (06) : 927 - 936