Visual-auditory learning network for construction equipment action detection

被引:11
|
作者
Jung, Seunghoon [1 ]
Jeoung, Jaewon [1 ]
Lee, Dong-Eun [2 ]
Jang, Hyounseung [3 ]
Hong, Taehoon [1 ]
机构
[1] Yonsei Univ, Dept Architecture & Architectural Engn, Seoul, South Korea
[2] Kyungpook Natl Univ, Sch Architecture Civil Engn Environm & Energy, Daegu, South Korea
[3] Seoul Natl Univ Sci & Technol, Sch Architecture, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
EARTHMOVING EXCAVATORS; MODEL;
D O I
10.1111/mice.12983
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Action detection of construction equipment is critical for tracking project performance, facilitating construction automation, and fostering construction efficiency in terms of construction site monitoring. Particularly, the auditory signal can provide additional information on computer vision-based action detection of various types of construction equipment. Therefore, this study aims to develop a visual-auditory learning network model for the action detection of construction equipment based on two modalities (i.e., vision and audition). To this end, both visual and auditory features are extracted from the multi-modal feature extractor. In addition, the multi-head attention and detection module is designed to conduct the localization and classification tasks in separate heads in which different attention mechanisms for each task are applied. Particularly, the content-based attention mechanism and the dot-product attention mechanism are, respectively, adopted for spatial attention in the localization head and channel attention in the classification head. The evaluation results show that the precision and recall of the proposed model can reach 86.92% and 84.00% with the adoption of the multi-head attention and detection module, which has proven to improve overall detection performance by utilizing different correlations of visual and auditory features for localization and classification, respectively.
引用
收藏
页码:1916 / 1934
页数:19
相关论文
共 50 条
  • [1] Automatic auditory change detection in humans is influenced by visual-auditory associative learning
    Laine, Matti
    Kwon, Myoung Soo
    Hamalainen, Heikki
    [J]. NEUROREPORT, 2007, 18 (16) : 1697 - 1701
  • [2] Memory for visual, auditory and visual-auditory material
    不详
    [J]. ANNEE PSYCHOLOGIQUE, 1936, 37 : 655 - 656
  • [3] Multiple kernel visual-auditory representation learning for retrieval
    Hong Zhang
    Wenping Zhang
    Wenhe Liu
    Xin Xu
    Hehe Fan
    [J]. Multimedia Tools and Applications, 2016, 75 : 9169 - 9184
  • [4] Multiple kernel visual-auditory representation learning for retrieval
    Zhang, Hong
    Zhang, Wenping
    Liu, Wenhe
    Xu, Xin
    Fan, Hehe
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (15) : 9169 - 9184
  • [5] Implicit perceptual learning of visual-auditory modality sequences
    Koch, Iring
    Blotenberg, Iris
    Fedosejew, Viktoria
    Stephan, Denise N.
    [J]. ACTA PSYCHOLOGICA, 2020, 202
  • [6] VISUAL-AUDITORY DISTANCE CONSTANCY
    ENGEL, GR
    DOUGHERTY, WG
    [J]. NATURE, 1971, 234 (5327) : 308 - +
  • [7] VISUAL-AUDITORY DISTANCE CONSTANCY
    DAY, RH
    [J]. NATURE, 1972, 238 (5361) : 227 - &
  • [8] Differential deficits in visual-auditory learning of depressed and demented patients
    Noggle, CA
    Dean, RS
    Finch, WH
    [J]. ARCHIVES OF CLINICAL NEUROPSYCHOLOGY, 2005, 20 (07) : 858 - 858
  • [9] Effectiveness of the Visual-Auditory Shadowing Method on Learning the Pronunciation of Kanji
    Nakayama, Tomokazu
    [J]. JAPANESE PSYCHOLOGICAL RESEARCH, 2021, 63 (01) : 26 - 34
  • [10] Visual-Auditory saliency detection using event-driven visual sensors
    Akolkar, Himanshu
    Valeiras, David Reverter
    Benosman, Ryad
    Bartolozzi, Chiara
    [J]. PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE ON EVENT-BASED CONTROL, COMMUNICATION AND SIGNAL PROCESSING EBCCSP 2015, 2015,