A Data Cleaning Method for Big Trace Data Using Movement Consistency

被引:10
|
作者
Yang, Xue [1 ]
Tang, Luliang [1 ]
Zhang, Xia [2 ]
Li, Qingquan [3 ]
机构
[1] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & R, Wuhan 430079, Hubei, Peoples R China
[2] Wuhan Univ, Sch Urban Design, Wuhan 430070, Hubei, Peoples R China
[3] Shenzhen Univ, Coll Civil Engn, Shenzhen 518060, Peoples R China
来源
SENSORS | 2018年 / 18卷 / 03期
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
data cleaning; big data; vehicle trajectory; movement consistency modeling; GPS;
D O I
10.3390/s18030824
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Given the popularization of GPS technologies, the massive amount of spatiotemporal GPS traces collected by vehicles are becoming a new kind of big data source for urban geographic information extraction. The growing volume of the dataset, however, creates processing and management difficulties, while the low quality generates uncertainties when investigating human activities. Based on the conception of the error distribution law and position accuracy of the GPS data, we propose in this paper a data cleaning method for this kind of spatial big data using movement consistency. First, a trajectory is partitioned into a set of sub-trajectories using the movement characteristic points. In this process, GPS points indicate that the motion status of the vehicle has transformed from one state into another, and are regarded as the movement characteristic points. Then, GPS data are cleaned based on the similarities of GPS points and the movement consistency model of the sub-trajectory. The movement consistency model is built using the random sample consensus algorithm based on the high spatial consistency of high-quality GPS data. The proposed method is evaluated based on extensive experiments, using GPS trajectories generated by a sample of vehicles over a 7-day period in Wuhan city, China. The results show the effectiveness and efficiency of the proposed method.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Histograms as a Side Effect of Data Movement for Big Data
    Istvan, Zsolt
    Woods, Louis
    Alonso, Gustavo
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1567 - 1578
  • [22] Trace-Based Method for Big Data Memory Characteristics Research
    Ma, Jianqiao
    Yu, Qi
    Huang, Libo
    Qian, Cheng
    Wang, Zhiying
    2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1023 - 1027
  • [23] Big Data Cleaning Algorithms in Cloud Computing
    Feng, Zhang
    Hui-Feng, Xue
    Dong-Sheng, Xu
    Yong-Heng, Zhang
    Fei, You
    INTERNATIONAL JOURNAL OF ONLINE ENGINEERING, 2013, 9 (03) : 77 - 81
  • [24] Cleanix: a Parallel Big Data Cleaning System
    Wang, Hongzhi
    Li, Mingda
    Bu, Yingyi
    Li, Jianzhong
    Gao, Hong
    Zhang, Jiacheng
    SIGMOD RECORD, 2015, 44 (04) : 35 - 40
  • [25] The cleaning method of duplicate big data based on association rule mining algorithm
    Wu, Ming
    INTERNATIONAL JOURNAL OF AUTONOMOUS AND ADAPTIVE COMMUNICATIONS SYSTEMS, 2023, 16 (02) : 220 - 231
  • [26] A Density-based Data Cleaning Approach for Deduplication with Data Consistency and Accuracy
    Al-Janabi, Samir
    Janicki, Ryszard
    PROCEEDINGS OF THE 2016 SAI COMPUTING CONFERENCE (SAI), 2016, : 492 - 501
  • [27] A SYSTEMATIC MAPPING REVIEW ON DATA CLEANING METHODS IN BIG DATA ENVIRONMENTS
    Iwata, Claudio Keiji
    Galegale, Napoleao Verardi
    Ito, Marcia
    de Azevedo, Marilia Macorin
    Feitosa, Marcelo Duduchi
    Arima, Carlos Hideo
    IADIS-INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2024, 19 (02): : 19 - 36
  • [28] Collecting Vertical Trace Data: Big Possibilities and Big Challenges for Multi-Method Research
    Menchen-Trevino, Ericka
    POLICY AND INTERNET, 2013, 5 (03): : 328 - 339
  • [29] Mining "Big Data" using Big Data Services
    Reips, Ulf-Dietrich
    Matzat, Uwe
    INTERNATIONAL JOURNAL OF INTERNET SCIENCE, 2014, 9 (01) : 1 - 8
  • [30] A Cleaning Method of Noise Data in RFID Data Streams
    Hu, Kongfa
    Li, Long
    Lu, Zhipeng
    2013 3RD INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, COMMUNICATIONS AND NETWORKS (CECNET), 2013, : 1 - 4