Parallel Bifold:: Large-scale parallel pattern mining with constraints

被引:4
|
作者
El-Hajj, Mohammad [1 ]
Zaiane, Osmar R. [1 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
关键词
parallel data mining; frequent pattern mining; constraint-based mining; share-nothing memory;
D O I
10.1007/s10619-006-0445-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets. We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than an hour and a half.
引用
收藏
页码:225 / 243
页数:19
相关论文
共 50 条
  • [1] Parallel Bifold: Large-scale parallel pattern mining with constraints
    Mohammad El-Hajj
    Osmar R. Zaïane
    [J]. Distributed and Parallel Databases, 2006, 20 : 225 - 243
  • [2] An Efficient Parallel Mining Algorithm Representative Pattern Set of Large-Scale Itemsets in IoT
    Zhang Tianrui
    Wei Mingqi
    Liu Bin
    [J]. IEEE ACCESS, 2018, 6 : 79162 - 79173
  • [3] Parallel simulation of large-scale parallel applications
    Bagrodia, R
    Deelman, E
    Phan, T
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2001, 15 (01): : 3 - 12
  • [4] Parallel and Quantitative Sequential Pattern Mining for Large-scale Interval-based Temporal Data
    Ruan, Guangchen
    Zhang, Hui
    Plale, Beth
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [5] Dynamic group communication for large-scale parallel data mining
    Katti, Amogh
    Di Fatta, Giuseppe
    [J]. CONCURRENT ENGINEERING-RESEARCH AND APPLICATIONS, 2013, 21 (03): : 227 - 234
  • [6] Design of large-scale parallel simulations
    Knepley, MG
    Sameh, AH
    Sarin, V
    [J]. PARALLEL COMPUTATIONAL FLUID DYNAMICS: TOWARDS TERAFLOPS, OPTIMIZATION, AND NOVEL FORMULATIONS, 2000, : 273 - 279
  • [7] Parallel genesis for large-scale modeling
    Goddard, NH
    Hood, G
    [J]. COMPUTATIONAL NEUROSCIENCE: TRENDS IN RESEARCH, 1997, 1997, : 911 - 917
  • [8] LARGE-SCALE PARALLEL PROCESSING SYSTEMS
    SIEGEL, HJ
    SCHWEDERSKI, T
    MEYER, DG
    HSU, WT
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 1987, 11 (01) : 3 - 20
  • [9] A Large-scale Parallel Fuzzing System
    Li, Yang
    Feng, Chao
    Tang, Chaojing
    [J]. ICAIP 2018: 2018 THE 2ND INTERNATIONAL CONFERENCE ON ADVANCES IN IMAGE PROCESSING, 2018, : 194 - 197
  • [10] Large-Scale Parallel Computing on Grids
    Bal, Henri
    Verstoep, Kees
    [J]. ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2008, 220 (02) : 3 - 17