Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

被引:0
|
作者
Li, Ruochen [1 ]
Li, Nannan [1 ]
Wang, Wenmin [1 ]
机构
[1] Macau Univ Sci & Technol, Sch Engn & Comp Sci, Ave Wai Long, Taipa 999078, Macau, Peoples R China
关键词
Audio-visual retrieval; Variational autoencoder; Mutual information; InfoMax-VAE;
D O I
10.1007/s13735-023-00276-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The human brain can process sound and visual information in overlapping areas of the cerebral cortex, which means that audio and visual information are deeply correlated with each other when we explore the world. To simulate this function of the human brain, audio-visual event retrieval (AVER) has been proposed. AVER is about using data from one modality (e.g., audio data) to query data from another. In this work, we aim to improve the performance of audio-visual event retrieval. To achieve this goal, first, we propose a novel network, InfoIIM, which enhance the accuracy of intra-model feature representation and inter-model feature alignment. The backbone of this network is a parallel connection of two VAE models with two different encoders and a shared decoder. Secondly, to enable the VAE to learn better feature representations and to improve intra-modal retrieval performance, we have used InfoMax-VAE instead of the vanilla VAE model. Additionally, we study the influence of modality-shared features on the effectiveness of audio-visual event retrieval. To verify the effectiveness of our proposed method, we validate our model on the AVE dataset, and the results show that our model outperforms several existing algorithms in most of the metrics. Finally, we present our future research directions, hoping to inspire relevant researchers.
引用
收藏
页数:9
相关论文
共 37 条
  • [1] Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval
    Ruochen Li
    Nannan Li
    Wenmin Wang
    International Journal of Multimedia Information Retrieval, 2023, 12
  • [2] Improving Intra- and Inter-Modality Visual Relation for Image Captioning
    Wang, Yong
    Zhang, WenKai
    Liu, Qing
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4190 - 4198
  • [3] Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering
    Gao, Peng
    Jiang, Zhengkai
    You, Haoxuan
    Lu, Pan
    Hoi, Steven
    Wang, Xiaogang
    Li, Hongsheng
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6632 - 6641
  • [4] Supervised Intra- and Inter-Modality Similarity Preserving Hashing for Cross-Modal Retrieval
    Chen, Zhikui
    Zhong, Fangming
    Min, Geyong
    Leng, Yonglin
    Ying, Yiming
    IEEE ACCESS, 2018, 6 : 27796 - 27808
  • [5] Intra- and inter-modality registration of functional and anatomical clinical images
    Eberl, S
    Braun, M
    NEW APPROACHES IN MEDICAL IMAGE ANALYSIS, 1999, 3747 : 102 - 114
  • [6] Cross-Modal Image-Recipe Retrieval via Intra- and Inter-Modality Hybrid Fusion
    Li, Jiao
    Sun, Jialiang
    Xu, Xing
    Yu, Wei
    Shen, Fumin
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 173 - 182
  • [7] Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation
    Lin, Zhenghong
    Tan, Yanchao
    Zhan, Yunfei
    Liu, Weiming
    Wang, Fan
    Chen, Chaochao
    Wang, Shiping
    Yang, Carl
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6234 - 6242
  • [8] Fusion of Intra- and Inter-modality Algorithms for Face-Sketch Recognition
    Galea, Christian
    Farrugia, Reuben A.
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2015, PT II, 2015, 9257 : 700 - 711
  • [9] Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrieval
    Shi, Dongxue
    Liu, Zheng
    Gao, Shanshan
    Li, Ang
    APPLIED INTELLIGENCE, 2025, 55 (01)
  • [10] Modeling Both Intra- and Inter-Modality Uncertainty for Multimodal Fake News Detection
    Wei, Lingwei
    Hu, Dou
    Zhou, Wei
    Hu, Songlin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7906 - 7916