Mining algorithms for sequential patterns in parallel: Hash based approach

被引:0
|
作者
Shintani, T [1 ]
Kitsuregawa, M [1 ]
机构
[1] Univ Tokyo, Inst Ind Sci, Minato Ku, Tokyo 106, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of mining sequential patterns in a large database of customer transactions. Since finding sequential patterns has to handle a large amount of customer transaction data and requires multiple passes over the database, it is expected that parallel algorithms help to improve the performance significantly. We consider the parallel algorithms for mining sequential patterns on a shared-nothing environment. Three parallel algorithms (Non Partitioned Sequential Pattern Mining(NPSPM), Simply Partitioned Sequential Pattern Mining(SPSPM) and Hash Partitioned Sequential Pattern Mining(HPSPM)) are proposed. In NPSPM, the candidate sequences are just copied among all the nodes, which can lead to memory overflow for large databases. The remaining two algorithms partition the candidate sequences over the nodes, which can efficiently exploit the total system's memory as the number of nodes in increased. If it is partitioned simply, customer transaction data has to be broadcasted to all nodes. HPSPM partitions the candidate sequences among the nodes using hash function, which eliminates the customer transaction data broadcasting and reduces the comparison workload. We describe the implementation of these algorithms on a shared-nothing parallel computer IBM SP2 and its performance evaluation results. Among three algorithms HPSPM attains best performance.
引用
收藏
页码:283 / 294
页数:12
相关论文
共 50 条
  • [31] An Efficient Approach for Mining Weighted Sequential Patterns in Dynamic Databases
    Ishita, Sabrina Zaman
    Noor, Faria
    Ahmed, Chowdhury Farhan
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 215 - 229
  • [32] Mining sequential patterns from data streams: a centroid approach
    Marascu, Alice
    Masseglia, Florent
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2006, 27 (03) : 291 - 307
  • [33] A Generic Approach to the Verification of the Permutation Property of Sequential and Parallel Swap-Based Sorting Algorithms
    Safari, Mohsen
    Huisman, Marieke
    INTEGRATED FORMAL METHODS, IFM 2020, 2020, 12546 : 257 - 275
  • [34] Strategies for using additional resources in parallel hash-based join algorithms
    Zhang, X
    Kurc, T
    Pan, T
    Catalyurek, U
    Narayanan, S
    Wyckoff, P
    Saltz, J
    13TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, PROCEEDINGS, 2004, : 4 - 13
  • [35] PARALLEL HASH-BASED JOIN ALGORITHMS FOR A SHARED-EVERYTHING ENVIRONMENT
    MARTIN, TP
    LARSON, PA
    DESHPANDE, V
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1994, 6 (05) : 750 - 763
  • [36] Performance analysis for parallel hash join algorithms based on DSVM and message passing
    Fang, Qiang
    Wang, Guoren
    Ye, Feng
    Yu, Ge
    Dongbei Daxue Xuebao/Journal of Northeastern University, 1999, 20 (06): : 583 - 586
  • [37] Mining Interesting Negative Sequential Patterns Based on Influence
    Cui, Fengling
    Ren, Xiaoqiang
    Dong, Xiangjun
    IEEE ACCESS, 2023, 11 : 12925 - 12936
  • [38] Parallel Algorithm for Mining Density-Aware Distinguishing Sequential Patterns with Spark
    Qin, Pan
    Duan, Lei
    Zhang, Tianqing
    Wang, Pu
    2016 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD 2016), 2016, : 144 - 149
  • [39] Location-Based Parallel Sequential Pattern Mining Algorithm
    Kim, Byoungwook
    Yi, Gangman
    IEEE ACCESS, 2019, 7 : 128651 - 128658
  • [40] Sequential and parallel cellular automata-based scheduling algorithms
    Seredynski, F
    Zomaya, AY
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2002, 13 (10) : 1009 - 1023