Balanced Parallel Frequent Pattern Mining Over Massive Data Stream

被引:33
|
作者
Fu, Xi [1 ]
Shi, Lei [1 ]
Li, Jing [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
关键词
D O I
10.1109/BigDataService.2017.15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is that, as data stream continuously arriving, the non frequent patterns discarded can possibly become frequent again. In this paper, aimed at the characteristics of real-time data stream, we propose a compact data structure, called CPS-tree to maintain and operate the full information of data stream. Compared to current related works, our algorithm can dynamically support large-scale data stream with one-pass scan which can be easily applied to other data stream processing environments; Moreover, the load imbalance in the current frequent pattern mining is a pretty common problem. We analysis the features of data stream, and propose a depth-based strategy to solve the imbalance problem in our parallel algorithm. In conclusion, we propose the BPFPMS algorithm, a balanced parallel frequent pattern mining over massive data stream, to dynamically and efficiently mine frequent patterns over large scale data stream. Our experiments show that our algorithm can achieve a good speedup and a good degree of balance among each node with different degree of parallelism.
引用
收藏
页码:50 / 59
页数:10
相关论文
共 50 条
  • [1] Frequent Pattern Mining for Massive XBRL Data on Parallel FP-growth
    Feng, Tao
    Zeng, Zhi-Yong
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMMUNICATION ENGINEERING (CSCE 2015), 2015, : 1297 - 1305
  • [2] An Efficient Frequent Pattern Mining Algorithm for Data Stream
    Liu Hualei
    Lin Shukuan
    Qiao Jianzhong
    Yu Ge
    Lu Kaifu
    [J]. INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 757 - 761
  • [3] Approximate Frequent Pattern Discovery Over Data Stream
    Kerdprasop, Kittisak
    Kerdprasop, Nittaya
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 26, PARTS 1 AND 2, DECEMBER 2007, 2007, 26 : 478 - +
  • [4] Parallel sequential pattern mining of massive trajectory data
    Qiao S.
    Li T.
    Peng J.
    Qiu J.
    [J]. International Journal of Computational Intelligence Systems, 2010, 3 (03) : 343 - 356
  • [5] Parallel Computing Algorithms for Big Data Frequent Pattern Mining
    Shaik, Subhani
    Subhani, Shaik
    Devarakonda, Nagaraju
    Nagamani, Ch.
    [J]. PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING, 2018, 9 : 113 - 123
  • [6] Mining non-derivable frequent itemsets over data stream
    Li, Haifeng
    Chen, Hong
    [J]. DATA & KNOWLEDGE ENGINEERING, 2009, 68 (05) : 481 - 498
  • [7] Mining compressed frequent itemsets over data stream in sliding windows
    Zhao, Li
    Tong, Yongxin
    Yu, Dan
    Ma, Shilong
    Chen, Mengdong
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 713 - 717
  • [8] Mining Closed Frequent Itemsets in the Sliding Window over Data Stream
    Mao Yinmin
    Yang Lumin
    Li Hong
    Chen Zhigang
    Liu Lixin
    [J]. 2009 IEEE YOUTH CONFERENCE ON INFORMATION, COMPUTING AND TELECOMMUNICATION, PROCEEDINGS, 2009, : 146 - 149
  • [9] An Efficient Algorithm for Mining Frequent Closed Itemsets over Data Stream
    Li Guodong
    Xia Kewen
    [J]. NEW TRENDS IN MECHATRONICS AND MATERIALS ENGINEERING, 2012, 151 : 570 - 575
  • [10] Improve frequent closed itemsets mining over data stream with bitmap
    Li, Haifeng
    Chen, Hong
    [J]. PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 399 - 404