Efficient Large-Scale GPS Trajectory Compression on Spark: A Pipeline-Based Approach

被引:3
|
作者
Xiong, Wen [1 ,2 ]
Wang, Xiaoxuan [1 ,2 ]
Li, Hao [1 ]
机构
[1] Yunnan Normal Univ, Sch Informat, Kunming 650500, Peoples R China
[2] Engn Res Ctr Comp Vis & Intelligent Control Techno, Yunnan Prov Dept Educ, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
trajectory compression; big data; spark; parallelized algorithm; MAPREDUCE;
D O I
10.3390/electronics12173569
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every day, hundreds of thousands of vehicles, including buses, taxis, and ride-hailing cars, continuously generate GPS positioning records. Simultaneously, the traffic big data platform of urban transportation systems has already collected a large amount of GPS trajectory datasets. These incremental and historical GPS datasets require more and more storage space, placing unprecedented cost pressure on the big data platform. Therefore, it is imperative to efficiently compress these large-scale GPS trajectory datasets, saving storage cost and subsequent computing cost. However, a set of classical trajectory compression algorithms can only be executed in a single-threaded manner and are limited to running in a single-node environment. Therefore, these trajectory compression algorithms are insufficient to compress this incremental data, which often amounts to hundreds of gigabytes, within an acceptable time frame. This paper utilizes Spark, a popular big data processing engine, to parallelize a set of classical trajectory compression algorithms. These algorithms consist of the DP (Douglas-Peucker), the TD-TR (Top-Down Time-Ratio), the SW (Sliding Window), SQUISH (Spatial Quality Simplification Heuristic), and the V-DP (Velocity-Aware Douglas-Peucker). We systematically evaluate these parallelized algorithms on a very large GPS trajectory dataset, which contains 117.5 GB of data produced by 20,000 taxis. The experimental results show that: (1) It takes only 438 s to compress this dataset in a Spark cluster with 14 nodes; (2) These parallelized algorithms can save an average of 26% on storage cost, and up to 40%. In addition, we design and implement a pipeline-based solution that automatically performs preprocessing and compression for continuous GPS trajectories on the Spark platform.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [31] Large-Scale Network Embedding in Apache Spark
    Lin, Wenqing
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3271 - 3279
  • [32] Large-Scale Data Pollution with Apache Spark
    Hildebrandt, Kai
    Panse, Fabian
    Wilcke, Niklas
    Ritter, Norbert
    IEEE TRANSACTIONS ON BIG DATA, 2020, 6 (02) : 396 - 411
  • [33] Processing large-scale data with Apache Spark
    Ko, Seyoon
    Won, Joong-Ho
    KOREAN JOURNAL OF APPLIED STATISTICS, 2016, 29 (06) : 1077 - 1094
  • [34] Large-scale geographically weighted regression on Spark
    Hung Tien Tran
    Hiep Tuan Nguyen
    Viet-Trung Tran
    2016 EIGHTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2016, : 127 - 132
  • [35] Accelerating Large-Scale Genomic Analysis with Spark
    Li, Xueqi
    Tan, Guangming
    Zhang, Chunming
    Li, Xu
    Zhang, Zhonghai
    Sun, Ninghui
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 747 - 751
  • [36] Large-Scale Human Action Recognition with Spark
    Wang, Hanli
    Zheng, Xiaobin
    Xiao, Bo
    2015 IEEE 17TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2015,
  • [37] Large-Scale Text Similarity Computing with Spark
    Bao, Xiaoan
    Dai, Shichao
    Zhang, Na
    Yu, Chenghai
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (04): : 95 - 100
  • [38] Spark-SIFT: A Spark-Based Large-Scale Image Feature Extract System
    Zhang, Xinming
    Yang, YaoHua
    Shen, Li
    2017 13TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2017), 2017, : 69 - 76
  • [39] Efficient Processing of Large-Scale Medical Data in IoT: A Hybrid Hadoop-Spark Approach for Health Status Prediction
    Yu, Lina
    Su, Wenlong
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (01) : 74 - 86
  • [40] SHARK: A Lightweight Model Compression Approach for Large-scale Recommender Systems
    Zhang, Beichuan
    Sun, Chenggen
    Tan, Jianchao
    Cai, Xinjun
    Zhao, Jun
    Miao, Mengqi
    Yin, Kang
    Song, Chengru
    Mou, Na
    Song, Yang
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4930 - 4937