A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

被引:14
|
作者
Xia, Dawen [1 ,2 ,3 ]
Lu, Xiaonan [1 ,2 ]
Li, Huaqing [4 ]
Wang, Wendong [3 ]
Li, Yantao [3 ]
Zhang, Zili [3 ,5 ]
机构
[1] Guizhou Minzu Univ, Coll Data Sci & Informat Engn, Guiyang 550025, Guizhou, Peoples R China
[2] Guizhou Minzu Univ, Coll Natl Culture & Cognit Sci, Guiyang 550025, Guizhou, Peoples R China
[3] Southwest Univ, Coll Comp & Informat Sci, Chongqing 400715, Peoples R China
[4] Southwest Univ, Coll Elect & Informat Engn, Chongqing 400715, Peoples R China
[5] Deakin Univ, Sch Informat Technol, Geelong, Vic 3216, Australia
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
TRANSPORTATION; CHALLENGES; HADOOP;
D O I
10.1155/2018/2818251
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Frequent pattern mining is an effective approach for spatiotemporal association analysis of mobile trajectory big data in data-driven intelligent transportation systems. While existing parallel algorithms have been successfully applied to frequent pattern mining of large-scale trajectory data, two major challenges are how to overcome the inherent defects of Hadoop to cope with taxi trajectory big data including massive small files and how to discover the implicitly spatiotemporal frequent patterns with MapReduce. To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale taxi trajectories with massive small file processing strategies on a Hadoop platform. More specifically, we first implement three methods, that is, Hadoop Archives (HAR), CombineFileInputFormat (CFIF), and Sequence Files (SF), to overcome the existing defects of Hadoop and then propose two strategies based on their performance evaluations. Next, we incorporate SF into Frequent Pattern growth (FP-growth) algorithm and then implement the optimized FP-growth algorithm on a MapReduce framework. Finally, we analyze the characteristics of taxi operating in both spatial and temporal dimensions byMR-PFP in parallel. The results demonstrate that MR-PFP is superior to existing Parallel FP-growth (PFP) algorithm in efficiency and scalability.
引用
收藏
页数:16
相关论文
共 40 条
  • [1] PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
    Mao Yimin
    Geng Junhao
    Deborah Simon Mwakapesa
    Yaser Ahangari Nanehkaran
    Zhang Chi
    Deng Xiaoheng
    Chen Zhigang
    [J]. Multimedia Systems, 2021, 27 : 709 - 722
  • [2] PFIMD: a parallel MapReduce-based algorithm for frequent itemset mining
    Mao, Yimin
    Geng, Junhao
    Mwakapesa, Deborah Simon
    Nanehkaran, Yaser Ahangari
    Chi, Zhang
    Deng, Xiaoheng
    Chen, Zhigang
    [J]. MULTIMEDIA SYSTEMS, 2021, 27 (04) : 709 - 722
  • [3] MapReduce-based parallel GEP algorithm for efficient function mining in big data applications
    Liu, Yang
    Ma, Chenxiao
    Xu, Lixiong
    Shen, Xiaodong
    Li, Maozhen
    Li, Pengcheng
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (23):
  • [4] PaMPa-HD: a Parallel MapReduce-based frequent Pattern miner for High-Dimensional data
    Apiletti, Daniele
    Baralis, Elena
    Cerquitelli, Tania
    Garza, Paolo
    Pulvirenti, Fabio
    Michiardi, Pietro
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2015, : 839 - 846
  • [5] Knowledge Extraction from Big Data using MapReduce-based Parallel-Reduct Algorithm
    Chowdhury, Tapan
    Chakraborty, Susanta
    Setua, S. K.
    [J]. PROCEEDINGS OF 2016 5TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), 2016, : 240 - 246
  • [6] MapReduce-based Parallel Algorithms for Multidimensionnal Data Analysis
    Pan, Jie
    Magoules, Frederic
    Le Biannic, Yann
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2012, 6 (02) : 325 - 350
  • [7] An Improved Parallel Association Rules Algorithm Based on MapReduce Framework for Big Data
    Zhou, Xinhao
    Huang, Yongfeng
    [J]. 2014 11TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2014, : 284 - 288
  • [8] The method and application of big data mining for mobile trajectory of taxi based on MapReduce
    Fansheng Kong
    Xiaola Lin
    [J]. Cluster Computing, 2019, 22 : 11435 - 11442
  • [9] CloudEC: A MapReduce-based Algorithm for Correcting Errors in Next-generation Sequencing Big Data
    Chung, Wei-Chun
    Ho, Jan-Ming
    Lin, Chung-Yen
    Lee, D. T.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 2836 - 2842
  • [10] Parallel Clustering Optimization Algorithm Based on MapReduce in Big Data Mining
    Zhang, Huajie
    Song, Lei
    Zhang, Sen
    [J]. IAENG International Journal of Applied Mathematics, 2023, 53 (01):