Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory

被引:2
|
作者
Kim, Jung Uk [1 ,2 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Image & Video Syst Lab, Daejeon 34141, South Korea
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Visual Artificial Intelligence Lab, Yongin 17104, South Korea
关键词
Memory network; modality recalling; object sound; visual object detection; AUDIOVISUAL INTEGRATION; ASSOCIATION;
D O I
10.1109/TNNLS.2023.3323560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When humans hear the sound of an object, they recall associated visual information and integrate the sound with recalled visual modality to detect the object. In this article, we present a novel sound-based object detector that mimics this process. We design a visual modality recalling (VMR) memory to recall information of a visual modality based on an audio modal input (i.e., sound). To achieve this goal, we propose a VMR loss and an audio-visual association loss to guide the VMR memory to memorize visual modal information by establishing associations between audio and visual modalities. With the visual modal information recalled through the VMR memory along with the original audio input, we perform audio-visual integration. In this step, we introduce an integrated feature contrastive loss that allows the integrated feature to be embedded as if it were encoded using both audio and visual modal inputs. This guidance enables our sound-based object detector to effectively perform visual object detection even when only sound is provided. We believe that our work is a cornerstone study that offers a new perspective to conventional object detection studies that solely rely on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with the VMR memory.
引用
收藏
页码:341 / 353
页数:13
相关论文
共 50 条
  • [31] Visual Object Categorization via Sparse Representation
    Fu, Huanzhang
    Zhu, Chao
    Dellandrea, Emmanuel
    Bichot, Charles-Edmond
    Chen, Liming
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 943 - 948
  • [32] Object Manipulation via Visual Target Localization
    Ehsani, Kiana
    Farhadi, Ali
    Kembhavi, Aniruddha
    Mottaghi, Roozbeh
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 321 - 337
  • [33] Visual Object Tracking via Guessing and Matching
    Song, Ke
    Zhang, Wei
    Lu, Weizhi
    Zha, Zheng-Jun
    Ji, Xiangyang
    Li, Yibin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (11) : 4182 - 4191
  • [34] Visual object affordances: Object orientation
    Symes, Ed
    Ellis, Rob
    Tucker, Mike
    ACTA PSYCHOLOGICA, 2007, 124 (02) : 238 - 255
  • [35] Visual object tracking via precise localization
    Liu, Xiaodong
    Jiang, Min
    Kong, Jun
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (03)
  • [36] Object detection based on visual memory: a feature learning and feature imagination process
    Dai, Houde
    Lin, Mingqiang
    Jiang, Wei
    ENTERPRISE INFORMATION SYSTEMS, 2020, 14 (04) : 515 - 531
  • [37] Enabling modality interactions for RGB-T salient object detection
    Zhang, Qiang
    Xi, Ruida
    Xiao, Tonglin
    Huang, Nianchang
    Luo, Yongjiang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 222
  • [38] The Effect of modality specific interference on working memory in recalling aversive auditory and visual memories
    Matthijssen, Suzy J. M. A.
    van Schie, Kevin
    van den Hout, Marcel A.
    COGNITION & EMOTION, 2019, 33 (06) : 1169 - 1180
  • [39] Object detection based on saturation of visual perception
    Chen Pan
    Wei Qi Yan
    Multimedia Tools and Applications, 2020, 79 : 19925 - 19944
  • [40] Visual Object Detection with Deformable Part Models
    Felzenszwalb, Pedro
    Girshick, Ross
    McAllester, David
    Ramanan, Deva
    COMMUNICATIONS OF THE ACM, 2013, 56 (09) : 97 - 105