An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

被引:14
|
作者
Huang, Junsheng [1 ,2 ]
Mao, Baohua [1 ,2 ,3 ]
Bai, Yun [1 ,2 ]
Zhang, Tong [1 ,2 ]
Miao, Changjun [4 ]
机构
[1] Beijing Jiaotong Univ, Sch Traff & Transportat, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Transport Ind Big Data Applicat Technol C, Beijing 100044, Peoples R China
[3] Beijing Jiaotong Univ, Integrated Transportat Res Ctr China, Beijing 100044, Peoples R China
[4] China Acad Railway Sci Corp Ltd, Signal & Commun Res Inst, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Intelligent Transportation System; missing values imputation; fuzzy C-means; genetic algorithm; EXPECTATION-MAXIMIZATION ALGORITHM; GENETIC ALGORITHM; REGRESSION; SELECTION; PREDICTION; VALUES;
D O I
10.3390/s20071992
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coe fficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the +/- 5% and +/- 10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] On Kernel Fuzzy c-Means for Data with Tolerance using Explicit Mapping for Kernel Data Analysis
    Kanzawa, Yuchi
    Endo, Yasunori
    Miyamoto, Sadaaki
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [42] Fuzzy C-means method with empirical mode decomposition for clustering microarray data
    Wang, Yan-Fei
    Yu, Zu-Guo
    Anh, Vo
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2010, : 192 - 197
  • [43] On Kernel Fuzzy c-Means for Data with Tolerance Using Explicit Mapping for Kernel Data Analysis
    Kanzawa, Yuchi
    Endo, Yasunori
    Miyamoto, Sadaaki
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2012, 16 (01) : 162 - 168
  • [44] Fuzzy Classification Function of Fuzzy c-Means Algorithms for Data with Tolerance
    Kanzawa, Yuchi
    Endo, Yasunori
    Miyamoto, Sadaaki
    2008 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-5, 2008, : 1083 - +
  • [45] Comparison of Illiteracy Cluster Pattern and Population Data using Fuzzy C-Means
    Rochmaniyah, Ni'matul
    Pujianto, Utomo
    2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 255 - 258
  • [46] A Novel Method of Clustering ECG Arrhythmia data using Robust Spatial Kernel Fuzzy C-Means
    Roopa, C. K.
    Harish, B. S.
    Kumar, S. V. Aruna
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 133 - 140
  • [47] Anomaly Detection in Time Series Data using a Fuzzy C-Means Clustering
    Izakian, Hesam
    Pedrycz, Witold
    PROCEEDINGS OF THE 2013 JOINT IFSA WORLD CONGRESS AND NAFIPS ANNUAL MEETING (IFSA/NAFIPS), 2013, : 1513 - 1518
  • [48] Interval kernel Fuzzy C-Means clustering of incomplete data
    Li, Tianhao
    Zhang, Liyong
    Lu, Wei
    Hou, Hui
    Liu, Xiaodong
    Pedrycz, Witold
    Zhong, Chongquan
    NEUROCOMPUTING, 2017, 237 : 316 - 331
  • [49] Fuzzy c-means clustering methods for symbolic interval data
    de Carvalho, Francisco de A. T.
    PATTERN RECOGNITION LETTERS, 2007, 28 (04) : 423 - 437
  • [50] A New Fuzzy c-Means Clustering Algorithm for Interval Data
    Jin, Yan
    Ma, Jianghong
    2013 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (ICCSAI 2013), 2013, : 156 - 159