Construction accident narrative classification: An evaluation of text mining techniques

被引:152
|
作者
Goh, Yang Miang [1 ]
Ubeynarayana, C. U. [1 ]
机构
[1] Natl Univ Singapore, Sch Design & Environm, Dept Bldg, SaRRU, 4 Architecture Dr, Singapore 117566, Singapore
来源
关键词
Accident classification; Construction safety; Data mining; Support vector machine; Text mining; SAFETY;
D O I
10.1016/j.aap.2017.08.026
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
Learning from past accidents is fundamental to accident prevention. Thus, accident and near miss reporting are encouraged by organizations and regulators. However, for organizations managing large safety databases, the time taken to accurately classify accident and near miss narratives will be very significant. This study aims to evaluate the utility of various text mining classification techniques in classifying 1000 publicly available construction accident narratives obtained from the US OSHA website. The study evaluated six machine learning algorithms, including support vector machine (SVM), linear regression (LR), random forest (RF), k-nearest neighbor (KNN), decision tree (DT) and Naive Bayes (NB), and found that SVM produced the best performance in classifying the test set of 251 cases. Further experimentation with tokenization of the processed text and non-linear SVM were also conducted. In addition, a grid search was conducted on the hyperparameters of the SVM models. It was found that the best performing classifiers were linear SVM with unigram tokenization and radial basis function (RBF) SVM with uni-gram tokenization. In view of its relative simplicity, the linear SVM is recommended. Across the 11 labels of accident causes or types, the precision of the linear SVM ranged from 0.5 to 1, recall ranged from 0.36 to 0.9 and Fl score was between 0.45 and 0.92. The reasons for misclassification were discussed and suggestions on ways to improve the performance were provided.
引用
收藏
页码:122 / 130
页数:9
相关论文
共 50 条
  • [1] Construction site accident analysis using text mining and natural language processing techniques
    Zhang, Fan
    Fleyeh, Hasan
    Wang, Xinru
    Lu, Minghui
    [J]. AUTOMATION IN CONSTRUCTION, 2019, 99 : 238 - 248
  • [2] A SURVEY ON CLASSIFICATION TECHNIQUES FOR TEXT MINING
    Brindha, S.
    Sukumaran, S.
    Prabha, K.
    [J]. 2016 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2016,
  • [3] Text mining-based construction site accident classification using hybrid supervised machine learning
    Cheng, Min-Yuan
    Kusoemo, Denny
    Gosno, Richard Antoni
    [J]. AUTOMATION IN CONSTRUCTION, 2020, 118
  • [4] Automated Operations Classification using Text Mining Techniques
    Esmael, Bilal
    Arnaout, Mohammad Arghad
    Fruhwirth, Rudolf K.
    Thonhauser, Gerhard
    [J]. 2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL V, 2010, : 235 - 238
  • [5] Arabic dialects classification using text mining techniques
    AL-Walaie, Mona Abdullah
    Khan, Muhammad Badruddin
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER AND APPLICATIONS (ICCA), 2017, : 325 - 329
  • [6] Construction-Accident Narrative Classification Using Shallow and Deep Learning
    Qiao, Jianfeng
    Wang, Changfeng
    Guan, Shuang
    Shuran, Lv
    [J]. JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, 2022, 148 (09)
  • [7] Construction and analysis of a coal mine accident causation network based on text mining
    Qiu, Zunxiang
    Liu, Quanlong
    Li, Xinchun
    Zhang, Jinjia
    Zhang, Yueqian
    [J]. PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2021, 153 : 320 - 328
  • [8] Biomedical literature mining for text classification and construction of gene networks
    Antonakaki, Despoina
    Kanterakis, Alexandros
    Potamias, George
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 3955 : 469 - 473
  • [9] Automatic classification of academic documents using text mining techniques
    Nunez, Haydemar
    Ramos, Esmeralda
    [J]. 2012 XXXVIII CONFERENCIA LATINOAMERICANA EN INFORMATICA (CLEI), 2012,
  • [10] Evaluation of Normalization Techniques in Text Classification for Portuguese
    Conrado, Merley da Silva
    Laguna Gutierrez, Victor Antonio
    Rezende, Solange Oliveira
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2012, PT III, 2012, 7335 : 618 - 630