Investigating the Role of Clustering in Construction-Accident Severity Prediction Using a Heterogeneous and Imbalanced Data Set

被引:8
|
作者
Salarian, Ali Akbar [1 ]
Etemadfard, Hossein [1 ]
Rahimzadegan, Ali [1 ]
Ghalehnovi, Mansour [1 ]
机构
[1] Ferdowsi Univ Mashhad, Dept Civil Engn, Mashhad 93, Iran
关键词
Construction accident analysis; Heterogeneity; Class imbalance; Oversampling; Clustering; IDENTIFYING ROOT CAUSES; COST OVERRUNS; SAFETY; MEGAPROJECTS; MODELS; RISKS; SITES; SMOTE;
D O I
10.1061/(ASCE)CO.1943-7862.0002406
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Despite remarkable advances in the construction industry, it is still among the most hazardous industries; accidents occur in the construction industry with different severity levels. Construction accident data sets are available for analysis, but they face heterogeneity and class imbalance issues. Multitudinous complexities and uncertainties in construction projects result in heterogeneity; this leads to poor predictive performance of machine learning algorithms. Class imbalance issues arise because accidents occur at different severities with unequal distribution, producing biased prediction results. This study aimed to assess the impact of clustering on construction accident analysis when a data set is heterogeneous and imbalanced and to take a step toward making incidents more predictable. Accidents were predicted following four data preparation approaches: unmodified, balanced, clustered and clustered + balanced. The k-means clustering algorithm was adopted to split the data into homogenous clusters. Synthetic minority oversampling technique (SMOTE) and k-means SMOTE (KMSMOTE) were used to overcome the class imbalance issue. Five different supervised machine learning algorithms-classification and regression tree (CART), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGB) and artificial neural network (ANN)-were employed for the prediction process. The results indicated that clustering significantly improved the predictive performance of the algorithms. The use of clustering along with oversampling was also the most appropriate approach to analyze accidents, providing more accurate and reliable predictions. The improvements resulting from applying the approach were about 33%, 23%, and 33% in terms of average precision, recall, and F1-score, respectively. Moreover, the ensemble learning classifiers used, RF and XGB, outperformed the other models. Ultimately, this research assisted safety professionals in predicting outcomes more accurately and in undertaking more appropriate safety measures. (C) 2022 American Society of CivilEngineers
引用
收藏
页数:13
相关论文
共 11 条
  • [1] Prediction of Accident and Accident Severity Based on Heterogeneous Data
    Kandacharam, Sneha
    Rajathilagam, B.
    [J]. DISTRIBUTED COMPUTING AND INTELLIGENT TECHNOLOGY, ICDCIT 2023, 2023, 13776 : 369 - 374
  • [2] Traffic accident severity prediction based on oversampling and CNN for imbalanced data
    Shangguan, Anqi
    Mu, Lingxia
    Xie, Guo
    Wang, Chenglan
    Jing, Yang
    Fei, Rong
    Hei, Xinhong
    [J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7004 - 7008
  • [3] Semisupervised Clustering Approach for Pipe Failure Prediction with Imbalanced Data Set
    Zali, Ramiz Beig
    Latifi, Milad
    Javadi, Akbar A.
    Farmani, Raziyeh
    [J]. JOURNAL OF WATER RESOURCES PLANNING AND MANAGEMENT, 2024, 150 (02)
  • [4] Generating road accident prediction set with road accident data analysis using enhanced expectation-maximization clustering algorithm and improved association rule mining
    Babu, Sakham Nagendra
    Tamilselvi, Jebamalar
    [J]. Journal Europeen des Systemes Automatises, 2019, 52 (01): : 57 - 63
  • [5] Intelligent medical heterogeneous big data set balanced clustering using deep learning
    Li, Xiaofeng
    Jiao, Hongshuang
    Li, Dong
    [J]. PATTERN RECOGNITION LETTERS, 2020, 138 : 548 - 555
  • [6] Prediction model of crash severity in imbalanced dataset using data leveling methods and metaheuristic optimization algorithms
    Danesh, Akbar
    Ehsani, Mehrdad
    Nejad, Fereidoon Moghadas
    Zakeri, Hamzeh
    [J]. INTERNATIONAL JOURNAL OF CRASHWORTHINESS, 2022, 27 (06) : 1869 - 1882
  • [7] Prediction of Autism Spectrum Disorder Based on Imbalanced Resting-state fMRI Data Using Clustering Oversampling
    Yuan, Dan
    Zhu, Li
    Huang, Huifang
    [J]. TENTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2019, 2019, 11071
  • [8] Factors affecting injury severity and the number of vehicles involved in a freeway traffic accident: investigating their heterogeneous effects by facility type using a latent class approach
    Jeon, Hyeonmyeong
    Kim, Jinhee
    Moon, Yeseul
    Park, Juneyoung
    [J]. INTERNATIONAL JOURNAL OF INJURY CONTROL AND SAFETY PROMOTION, 2021, 28 (04) : 521 - 530
  • [9] Innovative data clustering method improves drought prediction in heterogeneous landscapes using GEE-derived remote sensing indices
    Heydari, Hamed
    Momeni, Mehdi
    Nadi, Saeed
    [J]. REMOTE SENSING APPLICATIONS-SOCIETY AND ENVIRONMENT, 2024, 33
  • [10] Construction of a flow chart-like risk prediction model of ganciclovir-induced neutropaenia including severity grade: A data mining approach using decision tree
    Imai, Shungo
    Yamada, Takehiro
    Kasashi, Kumiko
    Ishiguro, Nobuhisa
    Kobayashi, Masaki
    Iseki, Ken
    [J]. JOURNAL OF CLINICAL PHARMACY AND THERAPEUTICS, 2019, 44 (05) : 726 - 734