An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

被引:14
|
作者
Huang, Junsheng [1 ,2 ]
Mao, Baohua [1 ,2 ,3 ]
Bai, Yun [1 ,2 ]
Zhang, Tong [1 ,2 ]
Miao, Changjun [4 ]
机构
[1] Beijing Jiaotong Univ, Sch Traff & Transportat, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Transport Ind Big Data Applicat Technol C, Beijing 100044, Peoples R China
[3] Beijing Jiaotong Univ, Integrated Transportat Res Ctr China, Beijing 100044, Peoples R China
[4] China Acad Railway Sci Corp Ltd, Signal & Commun Res Inst, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Intelligent Transportation System; missing values imputation; fuzzy C-means; genetic algorithm; EXPECTATION-MAXIMIZATION ALGORITHM; GENETIC ALGORITHM; REGRESSION; SELECTION; PREDICTION; VALUES;
D O I
10.3390/s20071992
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coe fficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the +/- 5% and +/- 10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] On Missing Traffic Data Imputation Based on Fuzzy C-Means Method by Considering Spatial-Temporal Correlation
    Tang, Jinjun
    Wang, Yinhai
    Zhang, Shen
    Wang, Hue
    Liu, Fang
    Yu, Shaowei
    TRANSPORTATION RESEARCH RECORD, 2015, (2528) : 86 - 95
  • [2] A Study of Data Imputation Using Fuzzy C-Means with Particle Swarm Optimization
    Samat, Nurul Ashikin
    Salleh, Mohd Najib Mohd
    RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING, 2017, 549 : 91 - 100
  • [3] Fuzzy c-means clustering of partially missing data sets
    Hathaway, RJ
    Overstreet, DD
    Bezdek, JC
    APPLICATIONS AND SCIENCE OF COMPUTATIONAL INTELLIGENCE III, 2000, 4055 : 159 - 165
  • [4] A hybrid approach to integrate fuzzy C-means based imputation method with genetic algorithm for missing traffic volume data estimation
    Tang, Jinjun
    Zhang, Guohui
    Wang, Yinhai
    Wang, Hua
    Liu, Fang
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2015, 51 : 29 - 40
  • [5] Fuzzy c-means classifier with deterministic initialization and missing value imputation
    Ichihashi, Hidetomo
    Honda, Katsuhiro
    Notsu, Akira
    Yagi, Takafumi
    2007 IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTATIONAL INTELLIGENCE, VOLS 1 AND 2, 2007, : 214 - +
  • [6] Fuzzy C-means method for clustering microarray data
    Dembélé, D
    Kastner, P
    BIOINFORMATICS, 2003, 19 (08) : 973 - 980
  • [7] On fuzzy c-means for data with tolerance
    Murata, Ryuichi
    Endo, Yasunori
    Haruyama, Hideyuki
    Miyamoto, Sadaaki
    MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, 2006, 3885 : 351 - 361
  • [8] Application of Hard C-means and Fuzzy C-means in data fusion
    Tang Ai-Hong
    Cai Li-An
    Zhang You-Mei
    DIGITAL MANUFACTURING & AUTOMATION III, PTS 1 AND 2, 2012, 190-191 : 265 - 268
  • [9] Fuzzy c-means classifier for incomplete data sets with outliers and missing values
    Ichihashi, Hidetomo
    Honda, Katsuhiro
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE FOR MODELLING, CONTROL & AUTOMATION JOINTLY WITH INTERNATIONAL CONFERENCE ON INTELLIGENT AGENTS, WEB TECHNOLOGIES & INTERNET COMMERCE, VOL 2, PROCEEDINGS, 2006, : 457 - +
  • [10] The modified fuzzy c-means method for clustering of microarray data
    Taraskina, A. S.
    Cheremushkin, E. S.
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON BIOINFORMATICS OF GENOME REGULATION AND STRUCTURE, VOL 1, 2006, : 180 - +