Mining algorithms for sequential patterns in parallel: Hash based approach

被引:0
|
作者
Shintani, T [1 ]
Kitsuregawa, M [1 ]
机构
[1] Univ Tokyo, Inst Ind Sci, Minato Ku, Tokyo 106, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of mining sequential patterns in a large database of customer transactions. Since finding sequential patterns has to handle a large amount of customer transaction data and requires multiple passes over the database, it is expected that parallel algorithms help to improve the performance significantly. We consider the parallel algorithms for mining sequential patterns on a shared-nothing environment. Three parallel algorithms (Non Partitioned Sequential Pattern Mining(NPSPM), Simply Partitioned Sequential Pattern Mining(SPSPM) and Hash Partitioned Sequential Pattern Mining(HPSPM)) are proposed. In NPSPM, the candidate sequences are just copied among all the nodes, which can lead to memory overflow for large databases. The remaining two algorithms partition the candidate sequences over the nodes, which can efficiently exploit the total system's memory as the number of nodes in increased. If it is partitioned simply, customer transaction data has to be broadcasted to all nodes. HPSPM partitions the candidate sequences among the nodes using hash function, which eliminates the customer transaction data broadcasting and reduces the comparison workload. We describe the implementation of these algorithms on a shared-nothing parallel computer IBM SP2 and its performance evaluation results. Among three algorithms HPSPM attains best performance.
引用
收藏
页码:283 / 294
页数:12
相关论文
共 50 条
  • [21] A Geometric Approach for Mining Sequential Patterns in Interval-Based Data Streams
    Hassani, Marwan
    Lu, Yifeng
    Wischnewsky, Jens
    Seidl, Thomas
    2016 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2016, : 2128 - 2135
  • [22] Efficient algorithms for simultaneously mining concise representations of sequential patterns based on extended pruning conditions
    Hai Duong
    Tin Truong
    Bac Le
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 67 : 197 - 210
  • [23] An efficient approach for mining periodic sequential access patterns
    Zhou, BY
    Hui, SC
    Fong, ACM
    PRICAI 2004: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3157 : 485 - 494
  • [24] A data mining approach to discovering reliable sequential patterns
    Shyur, Huan-Jyh
    Jou, Chichang
    Chang, Keng
    JOURNAL OF SYSTEMS AND SOFTWARE, 2013, 86 (08) : 2196 - 2203
  • [25] PARALLEL COMPACT HASH ALGORITHMS FOR COMPUTATIONAL MESHES
    Tumblin, Rebecka
    Ahrens, Peter
    Hartse, Sara
    Robey, Robert W.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : C31 - C53
  • [26] Improved Hash partition for parallel DBMS and parallel join algorithms
    Lu, Lina
    Meng, Hong
    Wei, Hengyi
    Yang, Maishun
    2000, Sci Press (37):
  • [27] Cache performance on parallel hash join algorithms
    Moreno, ED
    Mucheroni, ML
    Kofuji, ST
    HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS, 2000, 541 : 387 - 402
  • [28] A FSA-based approach for mining sequential patterns with user-specified skeletons
    Hang, XS
    Huang, H
    Yuan, HC
    Xiong, FL
    PROCEEDINGS OF THE 4TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-4, 2002, : 537 - 541
  • [29] Mining sequential patterns from data streams: a centroid approach
    Alice Marascu
    Florent Masseglia
    Journal of Intelligent Information Systems, 2006, 27 : 291 - 307
  • [30] Mining sequential patterns by pattern-growth: The PrefixSpan approach
    Pei, J
    Han, JW
    Mortazavi-Asl, B
    Wang, JY
    Pinto, H
    Chen, QM
    Dayal, U
    Hsu, MC
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (11) : 1424 - 1440