Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory

被引:2
|
作者
Kim, Jung Uk [1 ,2 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Image & Video Syst Lab, Daejeon 34141, South Korea
[2] Kyung Hee Univ, Dept Comp Sci & Engn, Visual Artificial Intelligence Lab, Yongin 17104, South Korea
关键词
Memory network; modality recalling; object sound; visual object detection; AUDIOVISUAL INTEGRATION; ASSOCIATION;
D O I
10.1109/TNNLS.2023.3323560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When humans hear the sound of an object, they recall associated visual information and integrate the sound with recalled visual modality to detect the object. In this article, we present a novel sound-based object detector that mimics this process. We design a visual modality recalling (VMR) memory to recall information of a visual modality based on an audio modal input (i.e., sound). To achieve this goal, we propose a VMR loss and an audio-visual association loss to guide the VMR memory to memorize visual modal information by establishing associations between audio and visual modalities. With the visual modal information recalled through the VMR memory along with the original audio input, we perform audio-visual integration. In this step, we introduce an integrated feature contrastive loss that allows the integrated feature to be embedded as if it were encoded using both audio and visual modal inputs. This guidance enables our sound-based object detector to effectively perform visual object detection even when only sound is provided. We believe that our work is a cornerstone study that offers a new perspective to conventional object detection studies that solely rely on the visual modality. Comprehensive experimental results demonstrate the effectiveness of the proposed method with the VMR memory.
引用
收藏
页码:341 / 353
页数:13
相关论文
共 50 条
  • [1] Enabling Visual Object Detection With Object Sounds via Visual Modality Recalling Memory
    Kim, Jung Uk
    Ro, Yong Man
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 341 - 353
  • [2] Sounds Move a Static Visual Object
    Teramoto, Wataru
    Hidaka, Souta
    Sugita, Yoichi
    PLOS ONE, 2010, 5 (08):
  • [3] Object Domain and Modality in the Ventral Visual Pathway
    Bi, Yanchao
    Wang, Xiaoying
    Caramazza, Alfonso
    TRENDS IN COGNITIVE SCIENCES, 2016, 20 (04) : 282 - 290
  • [4] The influence of location and visual features on visual object memory
    Sun, Hsin-Mei
    Gordon, Robert D.
    MEMORY & COGNITION, 2010, 38 (08) : 1049 - 1057
  • [5] The influence of location and visual features on visual object memory
    Hsin-Mei Sun
    Robert D. Gordon
    Memory & Cognition, 2010, 38 : 1049 - 1057
  • [6] Visual Object Detection: A Review
    Wang, Zuyi
    Jiao, Bowen
    Xu, Li
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 7106 - 7112
  • [7] Visual memory, attention, and object tokens
    Treisman, A
    DeSchepper, B
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 5024 - 5024
  • [8] Transsaccadic memory for visual object detail
    De Graef, P
    Verfaillie, K
    BRAIN'S EYE: NEUROBIOLOGICAL AND CLINICAL ASPECTS OF OCULOMOTOR RESEARCH, 2002, 140 : 181 - 196
  • [9] Object representations in visual memory: Evidence from visual illusions
    Ben-Shalom, Asaf
    Ganel, Tzvi
    JOURNAL OF VISION, 2012, 12 (07):
  • [10] Visual object tracking via LDA
    1600, Institute of Electrical and Electronics Engineers Inc., United States