Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory

被引:2
|
作者
Kim, Jung Uk [1 ,2 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Image & Video Syst Lab, Daejeon 34141, South Korea
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Visual Artificial Intelligence Lab, Yongin 17104, South Korea
关键词
Memory network; modality recalling; object sound; visual object detection; AUDIOVISUAL INTEGRATION; ASSOCIATION;
D O I
10.1109/TNNLS.2023.3323560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When humans hear the sound of an object, they recall associated visual information and integrate the sound with recalled visual modality to detect the object. In this article, we present a novel sound-based object detector that mimics this process. We design a visual modality recalling (VMR) memory to recall information of a visual modality based on an audio modal input (i.e., sound). To achieve this goal, we propose a VMR loss and an audio-visual association loss to guide the VMR memory to memorize visual modal information by establishing associations between audio and visual modalities. With the visual modal information recalled through the VMR memory along with the original audio input, we perform audio-visual integration. In this step, we introduce an integrated feature contrastive loss that allows the integrated feature to be embedded as if it were encoded using both audio and visual modal inputs. This guidance enables our sound-based object detector to effectively perform visual object detection even when only sound is provided. We believe that our work is a cornerstone study that offers a new perspective to conventional object detection studies that solely rely on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with the VMR memory.
引用
收藏
页码:341 / 353
页数:13
相关论文
共 50 条
  • [21] Natural sounds activate object-related visual cortex
    Naumer, MJ
    Wibral, M
    Singer, W
    Muckli, L
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2005, : 83 - 83
  • [22] Novelty assessment in visual object recognition memory
    Kishiyama, M
    Yonelinas, AP
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, : 131 - 131
  • [23] Threat as a Feature in Visual Semantic Object Memory
    Calley, Clifford S.
    Motes, Michael A.
    Chiang, H-Sheng
    Buhl, Virginia
    Spence, Jeffrey S.
    Abdi, Herve
    Anand, Raksha
    Maguire, Mandy
    Estevez, Leonardo
    Briggs, Richard
    Freeman, Thomas
    Kraut, Michael A.
    Hart, John, Jr.
    HUMAN BRAIN MAPPING, 2013, 34 (08) : 1946 - 1955
  • [24] Object marking in German Sign Language (Deutsche Gebardensprache): Differential object marking and object shift in the visual modality
    Bross, Fabian
    GLOSSA-A JOURNAL OF GENERAL LINGUISTICS, 2020, 5 (01):
  • [25] Object-based storage in visual working memory and the visual hierarchy
    Gao, Tao
    Shen, Mowei
    Gao, Zaifeng
    Li, Jie
    VISUAL COGNITION, 2008, 16 (01) : 103 - 106
  • [26] Salient Object Detection via Fusion of Multi-Visual Perception
    Zhou, Wenjun
    Wang, Tianfei
    Wu, Xiaoqin
    Zuo, Chenglin
    Wang, Yifan
    Zhang, Quan
    Peng, Bo
    APPLIED SCIENCES-BASEL, 2024, 14 (08):
  • [27] A visual tracking method via object detection based on deep learning
    Tang C.
    Ling Y.
    Yang H.
    Yang X.
    Zheng C.
    Hongwai yu Jiguang Gongcheng/Infrared and Laser Engineering, 2018, 47 (05):
  • [28] Lightweight Salient Object Detection via Hierarchical Visual Perception Learning
    Liu, Yun
    Gu, Yu-Chao
    Zhang, Xin-Yu
    Wang, Weiwei
    Cheng, Ming-Ming
    IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (09) : 4439 - 4449
  • [29] Learned Filters for Object Detection in Multi-object Visual Tracking
    Stamatescu, Victor
    Wong, Sebastien
    McDonnell, Mark D.
    Kearney, David
    AUTOMATIC TARGET RECOGNITION XXVI, 2016, 9844
  • [30] Object-position binding in visual memory for natural scenes and object arrays
    Hollingworth, Andrew
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2007, 33 (01) : 31 - 47