Safety compliance checking of construction behaviors using visual question answering

被引:15
|
作者
Ding, Yuexiong [1 ,2 ]
Liu, Muyang [1 ,2 ]
Luo, Xiaowei [1 ,2 ]
机构
[1] City Univ Hong Kong, Dept Architecture & Civil Engn, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Shenzhen Res Inst, Architecture & Civil Engn Res Ctr, Shenzhen, Peoples R China
关键词
Construction safety management; Safety compliance checking; Visual reasoning; Visual question answering; Cross -modal model; Vision -and -language Transformer; FALLS;
D O I
10.1016/j.autcon.2022.104580
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Unsafe construction behavior, one of the leading factors of accidents and casualties, can be reduced by strengthening construction inspection. However, current methods use either manual inspection or inefficient cross-modal models based on multiple backbone networks. To alleviate the problems, a "rule-question" trans-formation and annotation system is formulated, and the unsafe behavior detection is turned into a visual reasoning task: visual question answering (VQA). The VQA model is developed based on a vision-and-language Transformer, and the unsafe behavior could be identified based on the output answers. A dataset containing 16 safety rules and 2386 related construction images is used to fine-tune and validate the VQA model. The results show that the developed VQA model achieves an average recall of 0.81 at a faster reasoning speed. Finally, an applet for safety report generation is implemented to demonstrate the feasibility and practicability of the safety compliance checking based on VQA.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Question action relevance and editing for visual question answering
    Andeep S. Toor
    Harry Wechsler
    Michele Nappi
    Multimedia Tools and Applications, 2019, 78 : 2921 - 2935
  • [32] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [33] Question action relevance and editing for visual question answering
    Toor, Andeep S.
    Wechsler, Harry
    Nappi, Michele
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 2921 - 2935
  • [34] Multi-Question Learning for Visual Question Answering
    Lei, Chenyi
    Wu, Lei
    Liu, Dong
    Li, Zhao
    Wang, Guoxin
    Tang, Haihong
    Li, Houqiang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11328 - 11335
  • [35] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [36] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [37] Structured Attentions for Visual Question Answering
    Zhu, Chen
    Zhao, Yanpeng
    Huang, Shuaiyi
    Tu, Kewei
    Ma, Yi
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1300 - 1309
  • [38] An Analysis of Visual Question Answering Algorithms
    Kafle, Kushal
    Kanan, Christopher
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1983 - 1991
  • [39] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [40] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173