HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues

被引:0
|
作者
Jha, Ankit [1 ]
Pal, Debabrata [1 ]
Singha, Mainak [1 ]
Agarwal, Naman [1 ]
Banerjee, Biplab [1 ]
机构
[1] Indian Inst Technol, Mumbai, Maharashtra, India
关键词
Multimodal learning; Audio-Visual remote sensing data; Few-shot learning; Meta-learning; CNN;
D O I
10.1007/978-3-031-74640-6_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognition of remote sensing (RS) or aerial images is currently of great interest, and advancements in deep learning algorithms added flavor to it in recent years. Occlusion, intra-class variance, lighting, etc., might arise while training neural networks using unimodal RS visual input. Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain. Here, we aim to solve a novel problem where both the audio and visual modalities are present during the meta-training of a few-shot learning (FSL) classifier; however, one of the modalities might be missing during the meta-testing stage. This problem formulation is pertinent in the RS domain, given the difficulties in data acquisition or sensor malfunctioning. To mitigate, we propose a novel few-shot generative framework, Hallucinated Audio-Visual Embeddings-Network (HAVE-Net), to meta-train cross-modal features from limited unimodal data. Precisely, these hallucinated features are meta-learned from base classes and used for few-shot classification on novel classes during the inference phase. The experimental results on the benchmark ADVANCE and AudioSetZSL datasets show that our hallucinated modality augmentation strategy for few-shot classification outperforms the classifier performance trained with the real multimodal information at least by 0.8-2%.
引用
收藏
页码:390 / 398
页数:9
相关论文
共 50 条
  • [21] SEMANTICS-GUIDED DATA HALLUCINATION FOR FEW-SHOT VISUAL CLASSIFICATION
    Lin, Chia-Ching
    Wang, Yu-Chiang Frank
    Lei, Chin-Laung
    Chen, Kuan-Ta
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3302 - 3306
  • [22] Few-Shot Visual Classification Using Image Pairs With Binary Transformation
    Zhang, Chunjie
    Li, Chenghua
    Cheng, Jian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (09) : 2867 - 2871
  • [23] Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting
    Li, Yanxiong
    Li, Jialong
    Si, Yongjie
    Tan, Jiaxin
    He, Qianhua
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2297 - 2311
  • [24] Few-shot class-incremental audio classification via discriminative prototype learning
    Xie, Wei
    Li, Yanxiong
    He, Qianhua
    Cao, Wenchang
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225
  • [25] Learning Task-Specific Embeddings for Few-Shot Classification via Local Weight Adaptation
    Gong, Nianru
    Duan, Pengfei
    Rong, Yi
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 485 - 491
  • [26] FlexR: Few-shot Classification with Language Embeddings for Structured Reporting of Chest X-rays
    Keicher, Matthias
    Zaripova, Kamilia
    Czempiel, Tobias
    Mach, Kristina
    Khakzar, Ashkan
    Navab, Nassir
    MEDICAL IMAGING WITH DEEP LEARNING, VOL 227, 2023, 227 : 1493 - 1508
  • [27] Visual-Semantic Alignment for Few-shot Remote Sensing Scene Classification
    Li, Haojun
    Li, Linjia
    Luo, Wei
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 411 - 417
  • [28] Visual Sentiment Analysis for Few-Shot Image Classification Based on Metric Learning
    Asakawa, Tetsuya
    Aono, Masaki
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 1081 - 1086
  • [29] Visual-Semantic Cooperative Learning for Few-Shot SAR Target Classification
    Wang, Siyuan
    Wang, Yinghua
    Zhang, Xiaoting
    Zhang, Chen
    Liu, Hongwei
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6532 - 6550
  • [30] LEARNING SEMANTICS-GUIDED VISUAL ATTENTION FOR FEW-SHOT IMAGE CLASSIFICATION
    Chu, Wen-Hsuan
    Wang, Yu-Chiang Frank
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2979 - 2983