Balanced Parallel Frequent Pattern Mining Over Massive Data Stream

被引:33
|
作者
Fu, Xi [1 ]
Shi, Lei [1 ]
Li, Jing [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
关键词
D O I
10.1109/BigDataService.2017.15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Frequent pattern mining is playing an increasingly important role in a growing number of real-time data flow scenarios, such as large-scale order stream data, network traffic monitoring, web accessing record stream, and so on. The continuous, unbounded and high speed characteristics of massive data stream are a huge challenge for the current frequent pattern mining approach. The main challenge is that, as data stream continuously arriving, the non frequent patterns discarded can possibly become frequent again. In this paper, aimed at the characteristics of real-time data stream, we propose a compact data structure, called CPS-tree to maintain and operate the full information of data stream. Compared to current related works, our algorithm can dynamically support large-scale data stream with one-pass scan which can be easily applied to other data stream processing environments; Moreover, the load imbalance in the current frequent pattern mining is a pretty common problem. We analysis the features of data stream, and propose a depth-based strategy to solve the imbalance problem in our parallel algorithm. In conclusion, we propose the BPFPMS algorithm, a balanced parallel frequent pattern mining over massive data stream, to dynamically and efficiently mine frequent patterns over large scale data stream. Our experiments show that our algorithm can achieve a good speedup and a good degree of balance among each node with different degree of parallelism.
引用
收藏
页码:50 / 59
页数:10
相关论文
共 50 条
  • [41] Abnormal Detecting over Data Stream Based on Maximal Pattern Mining Technology
    Cai, Saihua
    Sun, Ruizhi
    Li, Jiayao
    Deng, Chao
    Li, Sicong
    [J]. COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2018, 2019, 917 : 371 - 385
  • [42] FREQUENT PATTERN MINING OVER MOVIE PLOT KEYWORDS
    Arslan, Ahmet
    Yilmazel, Ozgur
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER AND COMPUTATIONAL INTELLIGENCE (ICCCI 2011), 2012, : 71 - 74
  • [43] An Efficient Outlier Detection Approach Over Uncertain Data Stream Based on Frequent Itemset Mining
    Hao, Shangbo
    Cai, Saihua
    Sun, Ruizhi
    Li, Sicong
    [J]. INFORMATION TECHNOLOGY AND CONTROL, 2019, 48 (01): : 34 - 46
  • [44] Sliding window based weighted maximal frequent pattern mining over data streams
    Lee, Gangin
    Yun, Unil
    Ryu, Keun Ho
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (02) : 694 - 708
  • [45] Parallel and Distributed Algorithms for Frequent Pattern Mining in Large Databases
    Tanbeer, Syed Khairuzzaman
    Ahmed, Chowdhury Farhan
    Jeong, Byeong-Soo
    [J]. IETE TECHNICAL REVIEW, 2009, 26 (01) : 55 - 66
  • [46] RESEARCH ON PARALLEL FREQUENT PATTERN MINING BASED ON ONTOLOGY AND RULES
    Yi, Chenxi
    Sun, Ming
    [J]. 4TH INTERNATIONAL CONFERENCE ON SMART AND SUSTAINABLE CITY (ICSSC 2017), 2017, : 33 - 37
  • [47] Load balancing approach parallel algorithm for frequent pattern mining
    Yu, Kun-Ming
    Zhou, Jiayi
    Hsia, Wei Chen
    [J]. PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2007, 4671 : 623 - +
  • [48] Parallel Frequent Pattern Mining without Candidate Generation on GPUs
    Wang, Fei
    Yuan, Bo
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 1046 - 1052
  • [49] A parallel approach for high utility-based frequent pattern mining in a big data environment
    Krishna Kumar Mohbey
    Sunil Kumar
    [J]. Iran Journal of Computer Science, 2021, 4 (3) : 195 - 200
  • [50] Sequential Pattern Mining from Stream Data
    Koper, Adam
    Hung Son Nguyen
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PT II, 2011, 7121 : 278 - 291