Audio-Visual Generalized Zero-Shot Learning Based on Variational Information Bottleneck

被引:0
|
作者
Li, Yapeng
Luo, Yong [1 ]
Du, Bo [1 ]
机构
[1] Wuhan Univ, Inst Artificial Intelligence, Sch Comp Sci, Natl Engn Res Ctr Multimedia Software, Wuhan 430072, Peoples R China
基金
中国国家自然科学基金;
关键词
Audio-visual; generalized zero-shot learning; information bottleneck; multi-modality fusion;
D O I
10.1109/ICME55011.2023.00084
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-visual generalized zero-shot learning (GZSL) aims to train a model on seen classes for classifying data samples from both seen classes and unseen classes. Due to the absence of unseen training samples, the model tends to misclassify unseen class samples into seen classes. To mitigate this problem, in this paper, we propose a method based on variational information bottleneck for audio-visual GZSL. Specifically, we model the joint representations as a product-of-experts over marginal representations to integrate the information of audio and visual. Besides, we introduce variational information bottleneck to the learning of audio-visual joint representations and marginal representations of audio, visual, and text label modalities. This helps our model reduce the negative impact of information that cannot be generalized to unseen classes. Experimental results conducted on the UCF-GZSL, VGGSound-GZSL, and ActivityNet-GZSL benchmarks demonstrate the effectiveness and superiority of the proposed model for audio-visual GZSL.
引用
收藏
页码:450 / 455
页数:6
相关论文
共 50 条
  • [1] Audio-Visual Generalized Zero-Shot Learning the Easy Way
    Mo, Shentong
    Morgado, Pedro
    COMPUTER VISION - ECCV 2024, PT LXXI, 2025, 15129 : 377 - 395
  • [2] Hyperbolic Audio-visual Zero-shot Learning
    Hong, Jie
    Hayder, Zeeshan
    Han, Junlin
    Fang, Pengfei
    Harandi, Mehrtash
    Petersson, Lars
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 7839 - 7849
  • [3] Learning semantic consistency for audio-visual zero-shot learning
    Xiaoyong Li
    Jing Yang
    Yuling Chen
    Wei Zhang
    Xiaoli Ruan
    Chengjiang Li
    Zhidong Su
    Artificial Intelligence Review, 58 (7)
  • [4] A Generative Approach to Audio-Visual Generalized Zero-Shot Learning: Combining Contrastive and Discriminative Techniques
    Zheng, Qichen
    Hong, Jie
    Farazi, Moshiur
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Spiking Tucker Fusion Transformer for Audio-Visual Zero-Shot Learning
    Li, Wenrui
    Wang, Penghong
    Xiong, Ruiqin
    Fan, Xiaopeng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 4840 - 4852
  • [6] Temporal and Cross-modal Attention for Audio-Visual Zero-Shot Learning
    Mercea, Otniel-Bogdan
    Hummel, Thomas
    Koepke, A. Sophia
    Akata, Zeynep
    COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 488 - 505
  • [7] Motion-Decoupled Spiking Transformer for Audio-Visual Zero-Shot Learning
    Li, Wenrui
    Zhao, Xi-Le
    Ma, Zhengyu
    Wang, Xingtao
    Fan, Xiaopeng
    Tian, Yonghong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3994 - 4002
  • [8] Object-Aware Image Augmentation for Audio-Visual Zero-Shot Learning
    Dong, Yujie
    Chen, Shiming
    Duan, Bowen
    Ding, Weiping
    Wang, Yisong
    You, Xinge
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [9] Temporal-Semantic Aligning and Reasoning Transformer for Audio-Visual Zero-Shot Learning
    Zhang, Kaiwen
    Zhao, Kunchen
    Tian, Yunong
    MATHEMATICS, 2024, 12 (14)
  • [10] Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
    Mercea, Otniel-Bogdan
    Riesch, Lukas
    Koepke, A. Sophia
    Akata, Zeynep
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10543 - 10553