Efficient Large-Scale GPS Trajectory Compression on Spark: A Pipeline-Based Approach

被引:3
|
作者
Xiong, Wen [1 ,2 ]
Wang, Xiaoxuan [1 ,2 ]
Li, Hao [1 ]
机构
[1] Yunnan Normal Univ, Sch Informat, Kunming 650500, Peoples R China
[2] Engn Res Ctr Comp Vis & Intelligent Control Techno, Yunnan Prov Dept Educ, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
trajectory compression; big data; spark; parallelized algorithm; MAPREDUCE;
D O I
10.3390/electronics12173569
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every day, hundreds of thousands of vehicles, including buses, taxis, and ride-hailing cars, continuously generate GPS positioning records. Simultaneously, the traffic big data platform of urban transportation systems has already collected a large amount of GPS trajectory datasets. These incremental and historical GPS datasets require more and more storage space, placing unprecedented cost pressure on the big data platform. Therefore, it is imperative to efficiently compress these large-scale GPS trajectory datasets, saving storage cost and subsequent computing cost. However, a set of classical trajectory compression algorithms can only be executed in a single-threaded manner and are limited to running in a single-node environment. Therefore, these trajectory compression algorithms are insufficient to compress this incremental data, which often amounts to hundreds of gigabytes, within an acceptable time frame. This paper utilizes Spark, a popular big data processing engine, to parallelize a set of classical trajectory compression algorithms. These algorithms consist of the DP (Douglas-Peucker), the TD-TR (Top-Down Time-Ratio), the SW (Sliding Window), SQUISH (Spatial Quality Simplification Heuristic), and the V-DP (Velocity-Aware Douglas-Peucker). We systematically evaluate these parallelized algorithms on a very large GPS trajectory dataset, which contains 117.5 GB of data produced by 20,000 taxis. The experimental results show that: (1) It takes only 438 s to compress this dataset in a Spark cluster with 14 nodes; (2) These parallelized algorithms can save an average of 26% on storage cost, and up to 40%. In addition, we design and implement a pipeline-based solution that automatically performs preprocessing and compression for continuous GPS trajectories on the Spark platform.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [21] An autoencoder compression approach for accelerating large-scale inverse problems
    Wittmer, Jonathan
    Badger, Jacob
    Sundar, Hari
    Bui-Thanh, Tan
    INVERSE PROBLEMS, 2023, 39 (11)
  • [22] Efficient Decomposition Approach for Large-Scale Refinery Scheduling
    Shah, Nikisha K.
    Sahay, Nihar
    Ierapetritou, Marianthi G.
    INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH, 2015, 54 (41) : 9964 - 9991
  • [23] Embedding Compression with Hashing for Efficient Representation Learning in Large-Scale Graph
    Yeh, Chin-Chia Michael
    Gu, Mengting
    Zheng, Yan
    Chen, Huiyuan
    Ebrahimi, Javid
    Zhuang, Zhongfang
    Wang, Junpeng
    Wang, Liang
    Zhang, Wei
    PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4391 - 4401
  • [24] Visualization of large-scale trajectory datasets
    Zachar, Gergely
    2023 CYBER-PHYSICAL SYSTEMS AND INTERNET-OF-THINGS WEEK, CPS-IOT WEEK WORKSHOPS, 2023, : 152 - 157
  • [25] Fast Large-Scale Trajectory Clustering
    Wang, Sheng
    Bao, Zhifeng
    Culpepper, J. Shane
    Sellis, Timos
    Qin, Xiaolin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 13 (01): : 29 - 42
  • [26] Optimizing large-scale hydrogen storage: A novel hybrid genetic algorithm approach for efficient pipeline network design
    Liu, Shitao
    Zhou, Jun
    Liang, Guangchuan
    Du, Penghua
    Li, Zichen
    Li, Chengyu
    INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2024, 66 : 430 - 444
  • [27] Efficient Training of Large-Scale Neural Networks Using Linear Pipeline Broadcast
    University of Science and Technology, Department of Big Data Science, Daejeon
    34112, Korea, Republic of
    不详
    34141, Korea, Republic of
    不详
    34112, Korea, Republic of
    IEEE Access, 2024, (165653-165662) : 165653 - 165662
  • [28] GPS NAVIGATION FOR LARGE-SCALE PHOTOGRAPHY
    BIGGS, PH
    PEARCE, CJ
    WESTCOTT, TJ
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 1989, 55 (12): : 1737 - 1741
  • [29] A novel compression approach for truck GPS trajectory data
    Liu, Sijing
    Chen, Gang
    Wei, Long
    Li, Guoqi
    IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (01) : 74 - 83
  • [30] A pipeline-based approach for long transaction processing in web service environments
    Tang, Feilong
    You, Ilsun
    Li, Li
    Wang, Cho-Li
    Cheng, Zixue
    Guo, Song
    INTERNATIONAL JOURNAL OF WEB AND GRID SERVICES, 2011, 7 (02) : 190 - 207