An Integrated Fuzzy C-Means Method for Missing Data Imputation Using Taxi GPS Data

被引:14
|
作者
Huang, Junsheng [1 ,2 ]
Mao, Baohua [1 ,2 ,3 ]
Bai, Yun [1 ,2 ]
Zhang, Tong [1 ,2 ]
Miao, Changjun [4 ]
机构
[1] Beijing Jiaotong Univ, Sch Traff & Transportat, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Key Lab Transport Ind Big Data Applicat Technol C, Beijing 100044, Peoples R China
[3] Beijing Jiaotong Univ, Integrated Transportat Res Ctr China, Beijing 100044, Peoples R China
[4] China Acad Railway Sci Corp Ltd, Signal & Commun Res Inst, Beijing 100081, Peoples R China
基金
中国国家自然科学基金;
关键词
Intelligent Transportation System; missing values imputation; fuzzy C-means; genetic algorithm; EXPECTATION-MAXIMIZATION ALGORITHM; GENETIC ALGORITHM; REGRESSION; SELECTION; PREDICTION; VALUES;
D O I
10.3390/s20071992
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Various traffic-sensing technologies have been employed to facilitate traffic control. Due to certain factors, e.g., malfunctioning devices and artificial mistakes, missing values typically occur in the Intelligent Transportation System (ITS) sensing datasets, resulting in a decrease in the data quality. In this study, an integrated imputation algorithm based on fuzzy C-means (FCM) and the genetic algorithm (GA) is proposed to improve the accuracy of the estimated values. The GA is applied to optimize the parameter of the membership degree and the number of cluster centroids in the FCM model. An experimental test of the taxi global positioning system (GPS) data in Manhattan, New York City, is employed to demonstrate the effectiveness of the integrated imputation approach. Three evaluation criteria, the root mean squared error (RMSE), correlation coe fficient (R), and relative accuracy (RA), are used to verify the experimental results. Under the +/- 5% and +/- 10% thresholds, the average RAs obtained by the integrated imputation method are 0.576 and 0.785, which remain the highest among different methods, indicating that the integrated imputation method outperforms the history imputation method and the conventional FCM method. On the other hand, the clustering imputation performance with the Euclidean distance is better than that with the Manhattan distance. Thus, our proposed integrated imputation method can be employed to estimate the missing values in the daily traffic management.
引用
收藏
页数:19
相关论文
共 50 条
  • [11] Missing value estimation for microarray data based on fuzzy C-means clustering
    Luo, JiaWei
    Yang, Tao
    Wang, Yan
    Eighth International Conference on High-Performance Computing in Asia-Pacific Region, Proceedings, 2005, : 611 - 616
  • [12] A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm
    Aydilek, Ibrahim Berkan
    Arslan, Ahmet
    INFORMATION SCIENCES, 2013, 233 : 25 - 35
  • [13] Towards missing data imputation: A study of fuzzy K-means clustering method
    Li, D
    Deogun, J
    Spaulding, W
    Shuart, B
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, 2004, 3066 : 573 - 579
  • [14] A fuzzy clustering model of data and fuzzy c-means
    Nascimento, S
    Mirkin, B
    Moura-Pires, F
    NINTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2000), VOLS 1 AND 2, 2000, : 302 - 307
  • [15] Hybrid Missing Value Imputation Algorithms Using Fuzzy C-Means and Vaguely Quantified Rough Set
    Li, Daiwei
    Zhang, Haiqing
    Li, Tianrui
    Bouras, Abdelaziz
    Yu, Xi
    Wang, Tao
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (05) : 1396 - 1408
  • [16] ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA
    Kumaran, Shamini Raja
    Othman, Mohd Shahizan
    Yusuf, Lizawati Mi
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2020, 19 (04): : 459 - 482
  • [17] Fuzzy c-means classifier for relational data
    Ichihashi, Hidetomo
    Honda, Katsuhiro
    Kuramoto, Yasuhiro
    Matsuura, Fumiaki
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 328 - 334
  • [18] Fuzzy c-means clustering of incomplete data
    Hathaway, RJ
    Bezdek, JC
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (05): : 735 - 744
  • [19] Clustering of COVID-19 data for knowledge discovery using c-means and fuzzy c-means
    Afzal, Asif
    Ansari, Zahid
    Alshahrani, Saad
    Raj, Arun K.
    Kuruniyan, Mohamed Saheer
    Saleel, C. Ahamed
    Nisar, Kottakkaran Sooppy
    RESULTS IN PHYSICS, 2021, 29
  • [20] Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario
    Di Nuovo, Alessandro G.
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (06) : 6793 - 6797