Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

被引:0
|
作者
Li, Ruochen [1 ]
Li, Nannan [1 ]
Wang, Wenmin [1 ]
机构
[1] Macau Univ Sci & Technol, Sch Engn & Comp Sci, Ave Wai Long, Taipa 999078, Macau, Peoples R China
关键词
Audio-visual retrieval; Variational autoencoder; Mutual information; InfoMax-VAE;
D O I
10.1007/s13735-023-00276-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The human brain can process sound and visual information in overlapping areas of the cerebral cortex, which means that audio and visual information are deeply correlated with each other when we explore the world. To simulate this function of the human brain, audio-visual event retrieval (AVER) has been proposed. AVER is about using data from one modality (e.g., audio data) to query data from another. In this work, we aim to improve the performance of audio-visual event retrieval. To achieve this goal, first, we propose a novel network, InfoIIM, which enhance the accuracy of intra-model feature representation and inter-model feature alignment. The backbone of this network is a parallel connection of two VAE models with two different encoders and a shared decoder. Secondly, to enable the VAE to learn better feature representations and to improve intra-modal retrieval performance, we have used InfoMax-VAE instead of the vanilla VAE model. Additionally, we study the influence of modality-shared features on the effectiveness of audio-visual event retrieval. To verify the effectiveness of our proposed method, we validate our model on the AVE dataset, and the results show that our model outperforms several existing algorithms in most of the metrics. Finally, we present our future research directions, hoping to inspire relevant researchers.
引用
收藏
页数:9
相关论文
共 37 条
  • [21] Echocardiography and magnetic resonance imaging based strain analysis of functional single ventricles: a study of intra- and inter-modality reproducibility
    Ghelani, Sunil J.
    Harrild, David M.
    Gauvreau, Kimberlee
    Geva, Tal
    Rathod, Rahul H.
    INTERNATIONAL JOURNAL OF CARDIOVASCULAR IMAGING, 2016, 32 (07): : 1113 - 1120
  • [22] Echocardiography and magnetic resonance imaging based strain analysis of functional single ventricles: a study of intra- and inter-modality reproducibility
    Sunil J. Ghelani
    David M. Harrild
    Kimberlee Gauvreau
    Tal Geva
    Rahul H. Rathod
    The International Journal of Cardiovascular Imaging, 2016, 32 : 1113 - 1120
  • [23] WHAT MAKES THE SOUND?: A DUAL-MODALITY INTERACTING NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
    Ramaswamy, Janani
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4372 - 4376
  • [24] Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
    Lai, Yung-Hsuan
    Chen, Yen-Chun
    Wang, Yu-Chiang Frank
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [25] Discriminative Cross-Modality Attention Network for Temporal Inconsistent Audio-Visual Event Localization
    Xuan, Hanyu
    Luo, Lei
    Zhang, Zhenyu
    Yang, Jian
    Yan, Yan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 7878 - 7888
  • [26] DUAL-MODALITY SEQ2SEQ NETWORK FOR AUDIO-VISUAL EVENT LOCALIZATION
    Lin, Yan-Bo
    Li, Yu-Jhe
    Wang, Yu-Chiang Frank
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2002 - 2006
  • [27] 3DUS, MRI and CT prostate volume definition: 3D evaluation of intra- and inter-modality and observer variability
    Smith, W
    Lewis, C
    Bauman, G
    Rodrigues, G
    D'Souza, D
    Ash, R
    Venkatesan, V
    Downey, D
    Fenster, A
    MEDICAL PHYSICS, 2005, 32 (06) : 2083 - 2083
  • [28] Content-based TV sports video retrieval based on audio-visual features and text information
    Liu, HY
    IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 481 - 484
  • [29] Augmented Cross-modality: Translating the Physiological Responses, Knowledge and Impression to Audio-visual Information in Virtual Reality
    Hirao, Yutaro
    Kawai, Takashi
    JOURNAL OF IMAGING SCIENCE AND TECHNOLOGY, 2018, 62 (06)
  • [30] Quantitative assessment of intra- and inter-modality deformable image registration of the heart, left ventricle, and thoracic aorta on longitudinal 4D-CT and MR images
    Omidi, Alireza
    Weiss, Elisabeth
    Wilson, John S.
    Rosu-Bubulac, Mihaela
    JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2022, 23 (02):