Distributed and scalable sequential pattern mining through stream processing

被引:19
|
作者
Chen, Chun-Chieh [1 ,2 ]
Shuai, Hong-Han [3 ]
Chen, Ming-Syan [2 ,4 ]
机构
[1] Natl Taiwan Univ, Grad Inst Networking & Multimedia, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei, Taiwan
[3] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu, Taiwan
[4] Natl Taiwan Univ, Dept Elect Engn, Taipei, Taiwan
关键词
Sequential pattern mining; Data mining; Cloud computing; MapReduce; Big data; Streaming MapReduce;
D O I
10.1007/s10115-017-1037-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scalability is a primary issue in existing sequential pattern mining algorithms for dealing with a large amount of data. Previous work, namely sequential pattern mining on the cloud (SPAMC), has already addressed the scalability problem. It supports the MapReduce cloud computing architecture for mining frequent sequential patterns on large datasets. However, this existing algorithm does not address the iterative mining problem, which is the problem that reloading data incur additional costs. Furthermore, it did not study the load balancing problem. To remedy these problems, we devised a powerful sequential pattern mining algorithm, the sequential pattern mining in the cloud-uniform distributed lexical sequence tree algorithm (SPAMC-UDLT), exploiting MapReduce and streaming processes. SPAMC-UDLT dramatically improves overall performance without launching multiple MapReduce rounds and provides perfect load balancing across machines in the cloud. The results show that SPAMC-UDLT can significantly reduce execution time, achieves extremely high scalability, and provides much better load balancing than existing algorithms in the cloud.
引用
收藏
页码:365 / 390
页数:26
相关论文
共 50 条
  • [1] Distributed and scalable sequential pattern mining through stream processing
    Chun-Chieh Chen
    Hong-Han Shuai
    Ming-Syan Chen
    Knowledge and Information Systems, 2017, 53 : 365 - 390
  • [2] A scalable sequential pattern mining algorithm
    Wang, Jiahong
    Asanuma, Yoshiaki
    Kodama, Eiichiro
    Takata, Toyoo
    2006 IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2006, : 437 - +
  • [3] A Stream Sequential Pattern Mining Model
    Li, Haifeng
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 704 - 707
  • [4] Scalable Distributed Stream Join Processing
    Lin, Qian
    Ooi, Beng Chin
    Wang, Zhengkui
    Yu, Cui
    SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 811 - 825
  • [5] Sequential Pattern Mining from Stream Data
    Koper, Adam
    Hung Son Nguyen
    ADVANCED DATA MINING AND APPLICATIONS, PT II, 2011, 7121 : 278 - 291
  • [6] Scalable and parallel sequential pattern mining using spark
    Xiao Yu
    Qing Li
    Jin Liu
    World Wide Web, 2019, 22 : 295 - 324
  • [7] Scalable and parallel sequential pattern mining using spark
    Yu, Xiao
    Li, Qing
    Liu, Jin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (01): : 295 - 324
  • [8] A Fuzzy Constrained Stream Sequential Pattern Mining Algorithm
    Shaken, Omid
    Pedram, Mir Mohsen
    Kelarestaghi, Manoochehr
    2014 7th International Symposium on Telecommunications (IST), 2014, : 20 - 24
  • [9] Stream Sequential Pattern Mining with Precise Error Bounds
    Mendes, Luiz F.
    Ding, Bolin
    Han, Jiawei
    ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 941 - 946
  • [10] Privacy Preserving Sequential Pattern Mining in Data Stream
    Huang, Qin-Hua
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2008, 15 : 69 - 75