PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach

被引:301
|
作者
Qu, Li [1 ]
Li, Li [1 ]
Zhang, Yi [1 ]
Hu, Jianming [1 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
Missing data; probabilistic principal component analysis (PPCA); traffic flow volume; PRINCIPAL COMPONENT ANALYSIS; MAXIMUM-LIKELIHOOD;
D O I
10.1109/TITS.2009.2026312
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
The missing data problem greatly affects traffic analysis. In this paper, we put forward a new reliable method called probabilistic principal component analysis (PPCA) to impute the missing flow volume data based on historical data mining. First, we review the current missing data-imputation method and why it may fail to yield acceptable results in many traffic flow applications. Second, we examine the statistical properties of traffic flow volume time series. We show that the fluctuations of traffic flow are Gaussian type and that principal component analysis (PCA) can be used to retrieve the features of traffic flow. Third, we discuss how to use a robust PCA to filter out the abnormal traffic flow data that disturb the imputation process. Finally, we recall the theories of PPCA/Bayesian PCA-based imputation algorithms and compare their performance with some conventional methods, including the nearest/mean historical imputation methods and the local interpolation/regression methods. The experiments prove that the PPCA method provides significantly better performance than the conventional methods, reducing the root-mean-square imputation error by at least 25%.
引用
收藏
页码:512 / 522
页数:11
相关论文
共 50 条
  • [41] Robust Missing Traffic Flow Imputation Considering Nonnegativity and Road Capacity
    Tan, Huachun
    Wu, Yuankai
    Cheng, Bin
    Wang, Wuhong
    Ran, Bin
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [42] A Low-Rank Tensor Model for Imputation of Missing Vehicular Traffic Volume
    Pastor, Giancarlo
    [J]. IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2018, 67 (09) : 8934 - 8938
  • [43] LSTM-based traffic flow prediction with missing data
    Tian, Yan
    Zhang, Kaili
    Li, Jianyuan
    Lin, Xianxuan
    Yang, Bailin
    [J]. NEUROCOMPUTING, 2018, 318 : 297 - 305
  • [44] Multiple imputation: a mature approach to dealing with missing data
    Chevret, S.
    Seaman, S.
    Resche-Rigon, M.
    [J]. INTENSIVE CARE MEDICINE, 2015, 41 (02) : 348 - 350
  • [45] A MODEL-BASED APPROACH TO THE IMPUTATION OF MISSING DATA - HOME INJURY INCIDENCES
    CONN, JM
    LUI, KJ
    MCGEE, DL
    [J]. STATISTICS IN MEDICINE, 1989, 8 (03) : 263 - 266
  • [46] A nonparametric multiple imputation approach for missing categorical data
    Zhou, Muhan
    He, Yulei
    Yu, Mandi
    Hsu, Chiu-Hsieh
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [47] A sequential distance-based approach for imputing missing data: Forward Imputation
    Solaro, Nadia
    Barbiero, Alessandro
    Manzi, Giancarlo
    Ferrari, Pier Alda
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2017, 11 (02) : 395 - 414
  • [48] A nonparametric multiple imputation approach for missing categorical data
    Muhan Zhou
    Yulei He
    Mandi Yu
    Chiu-Hsieh Hsu
    [J]. BMC Medical Research Methodology, 17
  • [49] A sequential distance-based approach for imputing missing data: Forward Imputation
    Nadia Solaro
    Alessandro Barbiero
    Giancarlo Manzi
    Pier Alda Ferrari
    [J]. Advances in Data Analysis and Classification, 2017, 11 : 395 - 414
  • [50] Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data
    Wei, Runmin
    Wang, Jingye
    Su, Mingming
    Jia, Erik
    Chen, Shaoqiu
    Chen, Tianlu
    Ni, Yan
    [J]. SCIENTIFIC REPORTS, 2018, 8