Missing value imputation for the analysis of incomplete traffic accident data

被引:73
|
作者
Deb, Rupam [1 ]
Liew, Alan Wee -Chung [1 ]
机构
[1] Griffith Univ, Sch Informat & Commun Technol, Gold Coast Campus, Nathan, Qld 4222, Australia
关键词
Data preprocessing; Decision tree; Missing value imputation; Categorical data; Traffic accident; FRAMEWORK; ERROR;
D O I
10.1016/j.ins.2016.01.018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Death, injury and disability resulting from road traffic crashes continue to be a major global public health problem. Recent data suggest that the number of fatalities from traffic crashes is in excess of 1.25 million people each year with non-fatal injuries affecting a further 20-50 million people. It is predicted that by 2030 road traffic accidents will have progressed to be the 5th leading cause of death and that the number of people who will die annually from traffic accidents will have doubled from current levels. Both developed and developing countries suffer from the consequences of increase in human population, and therefore, vehicle population. Therefore, methods to reduce accident severity are of great interest to traffic agencies and the public at large. To analyse traffic accident factors effectively we need a complete traffic accident historical database. Any missing data in the database could prevent the discovery of important environmental and road accident factors and lead to invalid conclusions. In this paper, we present a novel imputation method that exploits the within-record and between-record correlations to impute missing data of numerical or categorical values. In addition, our algorithm accounts for uncertainty in real world data by sampling from a list of potential imputed values according to their affinity degree. We evaluated our algorithm using four publicly available traffic accident databases from the United States, the first of which is the largest open federal database (explore.data.gov) in the United States, and the second is based on the National Incident Based Reporting System (NIBRS) of the city and county of Denver (data.opencolorado.org). The other two are from New York's open data portal (Motor Vehicle Crashes-case information: 2011 and Motor Vehicle Crashes-individual information: 2011, data.ny.gov). We compare our algorithm with four state-of-the-art imputation methods using missing value imputation accuracy and RMSE. Our results indicate that the proposed method performs significantly better than the existing algorithms we compared. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:274 / 289
页数:16
相关论文
共 50 条
  • [1] A Correlation Based Imputation Method for Incomplete Traffic Accident Data
    Deb, Rupam
    Liew, Alan Wee-Chung
    Oh, Erwin
    [J]. PRICAI 2014: TRENDS IN ARTIFICIAL INTELLIGENCE, 2014, 8862 : 905 - 912
  • [2] Multiple Imputation for Incomplete Traffic Accident Data Using Chained Equations
    Li, Linchao
    Zhang, Jian
    Wang, Yonggang
    Ran, Bin
    [J]. 2017 IEEE 20TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2017,
  • [3] Fuzzy neuron modeling of incomplete data for missing value imputation
    Zhang, Zheng
    Yan, Xiaoming
    Zhang, Liyong
    Lai, Xiaochen
    Lu, Wei
    [J]. INFORMATION SCIENCES, 2024, 659
  • [4] Combining data discretization and missing value imputation for incomplete medical datasets
    Huang, Min-Wei
    Tsai, Chih-Fong
    Tsui, Shu-Ching
    Lin, Wei-Chao
    [J]. PLOS ONE, 2023, 18 (11):
  • [5] Attribute-Associated Neuron Modeling and Missing Value Imputation for Incomplete Data
    Lai, Xiaochen
    Zhu, Jinchong
    Zhang, Liyong
    Zhang, Zheng
    Lu, Wei
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2021, 2021
  • [6] A functional data approach to missing value imputation and outlier detection for traffic flow data
    Chiou, Jeng-Min
    Zhang, Yi-Chen
    Chen, Wan-Hui
    Chang, Chiung-Wen
    [J]. TRANSPORTMETRICA B-TRANSPORT DYNAMICS, 2014, 2 (02) : 106 - 129
  • [7] Best Fit Missing Value Imputation (BFMVI) Algorithm for Incomplete Data in the Internet of Things
    Agbo, Benjamin
    Qin, Yongrui
    Hill, Richard
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INTERNET OF THINGS, BIG DATA AND SECURITY (IOTBDS), 2020, : 130 - 137
  • [8] MissII: Missing Information Imputation for Traffic Data
    Hou, Mingliang
    Tang, Tao
    Xia, Feng
    Sultan, Ibrahim
    Kaur, Roopdeep
    Kong, Xiangjie
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2024, 12 (03) : 752 - 765
  • [9] Missing traffic data: comparison of imputation methods
    Li, Yuebiao
    Li, Zhiheng
    Li, Li
    [J]. IET INTELLIGENT TRANSPORT SYSTEMS, 2014, 8 (01) : 51 - 57
  • [10] Missing Values Imputation Using Genetic Algorithm for the Analysis of Traffic Data
    Midde, Ranjit Reddy
    Srinivasa, K. G.
    Reddy, Eswara B.
    [J]. ARTIFICIAL INTELLIGENCE AND EVOLUTIONARY COMPUTATIONS IN ENGINEERING SYSTEMS, ICAIECES 2017, 2018, 668 : 251 - 261