An Interpretable Machine Learning-Based Hurdle Model for Zero-Inflated Road Crash Frequency Data Analysis: Real-World Assessment and Validation

被引:0
|
作者
Ben Khedher, Moataz Bellah [1 ,2 ]
Yun, Dukgeun [1 ,2 ]
机构
[1] Department of Civil and Environmental Engineering, KICT School, University of Science and Technology, Daejeon,34113, Korea, Republic of
[2] Department of Highway and Transportation Research, Korea Institute of Civil Engineering and Building Technology, Goyang-Si,10223, Korea, Republic of
来源
Applied Sciences (Switzerland) | 2024年 / 14卷 / 23期
关键词
Highway accidents;
D O I
10.3390/app142310790
中图分类号
学科分类号
摘要
Road traffic crashes pose significant economic and public health burdens, necessitating an in-depth understanding of crash causation and its links to underlying factors. This study introduces a machine learning-based hurdle model framework tailored for analyzing zero-inflated crash frequency data, addressing the limitations of traditional statistical models like the Poisson and negative binomial models, which struggle with zero-inflation and overdispersion. The research employs a two-stage modeling process using CatBoost. The first stage uses binary classification to identify road segments with potential crash occurrences, applying a customized loss function to tackle data imbalance. The second stage predicts crash frequency, also utilizing a customized loss function for count data. SHapley Additive exPlanations (SHAP) analysis interprets the model outcomes, providing insights into factors affecting crash likelihood and frequency. This study validates the model’s performance with real-world crash data from 2011 to 2015 in South Korea, demonstrating superior accuracy in both the classification and regression stages compared to other machine learning algorithms and traditional models. These findings have significant implications for traffic safety research and policymaking, offering stakeholders a more accurate and interpretable tool for crash data analysis to develop targeted safety interventions. © 2024 by the authors.
引用
收藏
相关论文
共 14 条
  • [1] A New Deep Learning-Based Zero-Inflated Duration Model for Financial Data Irregularly Spaced in Time
    Shi, Yong
    Dai, Wei
    Long, Wen
    FRONTIERS IN PHYSICS, 2021, 9
  • [2] Machine learning-based approach for glioblastoma drug repurposing on real-world patient data
    Lin, Ko-Hong
    Kim, Yejin
    Lee, Dung-Fang
    Jiang, Xiaoqian
    CANCER RESEARCH, 2023, 83 (08)
  • [3] The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data
    Huan, Jia-Ming
    Wang, Xiao-Jie
    Li, Yuan
    Zhang, Shi-Jun
    Hu, Yuan-Long
    Li, Yun-Lun
    BIODATA MINING, 2024, 17 (01):
  • [4] A Framework for Using Real-World Data and Health Outcomes Modeling to Evaluate Machine Learning-Based Risk Prediction Models
    Rodriguez, Patricia J.
    Veenstra, David L.
    Heagerty, Patrick J.
    Goss, Christopher H.
    Ramos, Kathleen J.
    Bansal, Aasthaa
    VALUE IN HEALTH, 2022, 25 (03) : 350 - 358
  • [5] Modeling the service-route-based crash frequency by a spatiotemporal-random-effect zero-inflated negative binomial model: An empirical analysis for bus-involved crashes
    Gu, Xujia
    Yan, Xuedong
    Ma, Lu
    Liu, Xiaobing
    ACCIDENT ANALYSIS AND PREVENTION, 2020, 144
  • [6] State of health analysis of batteries at different stages based on real-world vehicle data and machine learning
    Wang, Jiegang
    Yang, Haixu
    Wang, Zhenpo
    Zhou, Yangjie
    Liu, Peng
    Hong, Jichao
    JOURNAL OF ENERGY STORAGE, 2024, 88
  • [7] A machine learning model for predicting blood concentration of quetiapine in patients with schizophrenia and depression based on real-world data
    Hao, Yupei
    Zhang, Jinyuan
    Yang, Lin
    Zhou, Chunhua
    Yu, Ze
    Gao, Fei
    Hao, Xin
    Pang, Xiaolu
    Yu, Jing
    BRITISH JOURNAL OF CLINICAL PHARMACOLOGY, 2023, 89 (09) : 2714 - 2725
  • [8] Machine learning-based predictive and risk analysis using real-world data with blood biomarkers for hepatitis B patients in the malignant progression of hepatocellular carcinoma
    Nan, Yuemin
    Zhao, Suxian
    Zhang, Xiaoxiao
    Xiao, Zhifeng
    Guo, Ruihan
    FRONTIERS IN IMMUNOLOGY, 2022, 13
  • [9] A clinical prediction model based on interpretable machine learning algorithms for prolonged hospital stay in acute ischemic stroke patients: a real-world study
    Wang, Kai
    Jiang, Qianmei
    Gao, Murong
    Wei, Xiu'e
    Xu, Chan
    Yin, Chengliang
    Liu, Haiyan
    Gu, Renjun
    Wang, Haosheng
    Li, Wenle
    Rong, Liangqun
    FRONTIERS IN ENDOCRINOLOGY, 2023, 14
  • [10] Machine learning-based decision support model for selecting intra-arterial therapies for unresectable hepatocellular carcinoma: A national real-world evidence-based study
    An, Chao
    Wei, Ran
    Liu, Wendao
    Fu, Yan
    Gong, Xiaolong
    Li, Chengzhi
    Yao, Wang
    Zuo, Mengxuan
    Li, Wang
    Li, Yansheng
    Wu, Fatian
    Liu, Kejia
    Yan, Dong
    Wu, Peihong
    Han, Jianjun
    BRITISH JOURNAL OF CANCER, 2024, : 832 - 842