Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory

被引:2
|
作者
Kim, Jung Uk [1 ,2 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Image & Video Syst Lab, Daejeon 34141, South Korea
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Visual Artificial Intelligence Lab, Yongin 17104, South Korea
关键词
Memory network; modality recalling; object sound; visual object detection; AUDIOVISUAL INTEGRATION; ASSOCIATION;
D O I
10.1109/TNNLS.2023.3323560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When humans hear the sound of an object, they recall associated visual information and integrate the sound with recalled visual modality to detect the object. In this article, we present a novel sound-based object detector that mimics this process. We design a visual modality recalling (VMR) memory to recall information of a visual modality based on an audio modal input (i.e., sound). To achieve this goal, we propose a VMR loss and an audio-visual association loss to guide the VMR memory to memorize visual modal information by establishing associations between audio and visual modalities. With the visual modal information recalled through the VMR memory along with the original audio input, we perform audio-visual integration. In this step, we introduce an integrated feature contrastive loss that allows the integrated feature to be embedded as if it were encoded using both audio and visual modal inputs. This guidance enables our sound-based object detector to effectively perform visual object detection even when only sound is provided. We believe that our work is a cornerstone study that offers a new perspective to conventional object detection studies that solely rely on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with the VMR memory.
引用
收藏
页码:341 / 353
页数:13
相关论文
共 50 条
  • [41] A survey of occlusion detection method for visual object
    张世辉
    He Huan
    Liu Jianxin
    Zhang Yucheng
    Pang Yunchong
    Sang Yu
    HighTechnologyLetters, 2016, 22 (03) : 256 - 265
  • [42] Retinomorphic Object Detection in Asynchronous Visual Streams
    Li, Jianing
    Wang, Xiao
    Zhu, Lin
    Li, Jia
    Huang, Tiejun
    Tian, Yonghong
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1332 - 1340
  • [43] Visual saliency detection based object recognition
    School of Information Science and Engineering, YanShan University, Qinhuangdao, China
    不详
    J. Inf. Hiding Multimedia Signal Proces., 6 (1250-1263):
  • [44] Object detection and tracking based on visual attention
    Zhang, Huawei
    Zhang, Qiaorong
    ICIC Express Letters, 2012, 6 (10): : 2667 - 2671
  • [45] Automatic statistical object detection for visual surveillance
    Tavakkoli, Alireza
    Nicolescu, Mircea
    Bebis, George
    7TH IEEE SOUTHWEST SYMPOSIUM ON IMAGE ANALYSIS AND INTERPRETATION, 2006, : 144 - +
  • [46] Visual-Inertial Object Detection and Mapping
    Fei, Xiaohan
    Soatto, Stefano
    COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 318 - 334
  • [47] Learning fusion strategies for visual object detection
    Paletta, L
    Rome, E
    2000 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2000), VOLS 1-3, PROCEEDINGS, 2000, : 1446 - 1452
  • [48] Learning to Match Anchors for Visual Object Detection
    Zhang, Xiaosong
    Wan, Fang
    Liu, Chang
    Ji, Xiangyang
    Ye, Qixiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 3096 - 3109
  • [49] Selective visual attention in object detection processes
    Paletta, L
    Goyal, A
    Greindl, C
    APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN IMAGE PROCESSING VII, 2003, 5015 : 11 - 21
  • [50] A survey of occlusion detection method for visual object
    Zhang S.
    He H.
    Liu J.
    Zhang Y.
    Pang Y.
    Sang Y.
    Zhang, Shihui (sshhzz@ysu.edu.cn), 1600, Inst. of Scientific and Technical Information of China (22): : 256 - 265