HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

被引:0
|
作者
Jha, Ankit [1 ]
Pal, Debabrata [1 ]
Singha, Mainak [1 ]
Agarwal, Naman [1 ]
Banerjee, Biplab [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
Multimodal learning; Audio-Visual remote sensing data; Few-shot learning; Meta-learning; CNN;
D O I
10.1007/978-3-031-74640-6_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain. Here, we aim to solve a novel problem where both the audio and visual modalities are present during the meta-training of a few-shot learning (FSL) classifier; however, one of the modalities might be missing during the meta-testing stage. This problem formulation is pertinent in the RS domain, given the difficulties in data acquisition or sensor malfunctioning. To mitigate, we propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data. Precisely, these hallucinated features are meta-learned from base classes and used for few-shot classification on novel classes during the inference phase. The experimental results on the benchmark ADVANCE and AudioSetZSL datasets show that our hallucinated modality augmentation strategy for few-shot classification outperforms the classifier performance trained with the real multimodal information at least by 0.8-2%.
引用
收藏
页码:390 / 398
页数:9
相关论文
共 50 条
  • [1] Few-Shot Audio-Visual Learning of Environment Acoustics
    Majumder, Sagnik
    Chen, Changan
    Al-Halah, Ziad
    Grauman, Kristen
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Text-to-Feature Diffusion for Audio-Visual Few-Shot Learning
    Mercea, Otniel-Bogdan
    Hummel, Thomas
    Koepke, A. Sophia
    Akata, Zeynep
    PATTERN RECOGNITION, DAGM GCPR 2023, 2024, 14264 : 491 - 507
  • [3] MetaAudio: A Few-Shot Audio Classification Benchmark
    Heggan, Calum
    Budgett, Sam
    Hospedales, Timothy
    Yaghoobi, Mehrdad
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 219 - 230
  • [4] FEW-SHOT CONTINUAL LEARNING FOR AUDIO CLASSIFICATION
    Wang, Yu
    Bryan, Nicholas J.
    Cartwright, Mark
    Bello, Juan Pablo
    Salamon, Justin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 321 - 325
  • [5] Audio-Visual Generalized Few-Shot Learning with Prototype-Based Co-Adaptation
    Zhang, Yi-Kai
    Zhou, Da-Wei
    Ye, Han-Jia
    Zhan, De-Chuan
    INTERSPEECH 2022, 2022, : 531 - 535
  • [6] Few-shot Audio Classification using Contrastive Training
    Cigdem, Enes Furkan
    Keles, Hacer Yalim
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [7] Visual Classification of Malware by Few-shot Learning
    Tran, Kien
    Kubo, Masao
    Sato, Hiroshi
    PROCEEDINGS OF THE 2020 INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB2020), 2020, : 770 - 774
  • [8] Aligning Visual Prototypes with BERT Embeddings for Few-Shot Learning
    Yan, Kun
    Bouraoui, Zied
    Wang, Ping
    Jameel, Shoaib
    Schockaert, Steven
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 367 - 375
  • [9] Vehicle Detection and Classification using Audio-Visual cues
    Piyush, P.
    Rajan, Rajeev
    Mary, Leena
    Koshy, Bino I.
    2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 732 - 736
  • [10] Few-Shot Audio Classification with Attentional Graph Neural Networks
    Zhang, Shilei
    Qin, Yong
    Sun, Kewei
    Lin, Yonghua
    INTERSPEECH 2019, 2019, : 3649 - 3653