Safety compliance checking of construction behaviors using visual question answering

被引:15
|
作者
Ding, Yuexiong [1 ,2 ]
Liu, Muyang [1 ,2 ]
Luo, Xiaowei [1 ,2 ]
机构
[1] City Univ Hong Kong, Dept Architecture & Civil Engn, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Shenzhen Res Inst, Architecture & Civil Engn Res Ctr, Shenzhen, Peoples R China
关键词
Construction safety management; Safety compliance checking; Visual reasoning; Visual question answering; Cross -modal model; Vision -and -language Transformer; FALLS;
D O I
10.1016/j.autcon.2022.104580
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Unsafe construction behavior, one of the leading factors of accidents and casualties, can be reduced by strengthening construction inspection. However, current methods use either manual inspection or inefficient cross-modal models based on multiple backbone networks. To alleviate the problems, a "rule-question" trans-formation and annotation system is formulated, and the unsafe behavior detection is turned into a visual reasoning task: visual question answering (VQA). The VQA model is developed based on a vision-and-language Transformer, and the unsafe behavior could be identified based on the output answers. A dataset containing 16 safety rules and 2386 related construction images is used to fine-tune and validate the VQA model. The results show that the developed VQA model achieves an average recall of 0.81 at a faster reasoning speed. Finally, an applet for safety report generation is implemented to demonstrate the feasibility and practicability of the safety compliance checking based on VQA.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Visual Question Answering with Question Representation Update (QRU)
    Li, Ruiyu
    Jia, Jiaya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [42] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [43] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [44] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [45] Visual Question Answering as Reading Comprehension
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    van den Hengel, Anton
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6312 - 6321
  • [46] Revisiting Visual Question Answering Baselines
    Jabri, Allan
    Joulin, Armand
    van der Maaten, Laurens
    COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 727 - 739
  • [47] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87
  • [48] iVQA: Inverse Visual Question Answering
    Liu, Feng
    Xiang, Tao
    Hospedales, Timothy M.
    Yang, Wankou
    Sun, Changyin
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8611 - 8619
  • [49] VAQA: Visual Arabic Question Answering
    Kamel, Sarah M. M.
    Hassan, Shimaa I. I.
    Elrefaei, Lamiaa
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2023, 48 (08) : 10803 - 10823
  • [50] Adapted GooLeNet for Visual Question Answering
    Huang, Jie
    Hu, Yue
    Yang, Weilong
    2018 3RD INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE), 2018, : 603 - 606