TRANSFORMER-BASED MULTI-MODAL LEARNING FOR MULTI-LABEL REMOTE SENSING IMAGE CLASSIFICATION

被引:0
|
作者
Hoffmann, David Sebastian [1 ]
Clasen, Kai Norman [1 ]
Demir, Begum [1 ,2 ]
机构
[1] Tech Univ Berlin, Fac Elect Engn & Comp Sci, Berlin, Germany
[2] BIFOLD Berlin Inst Fdn Learning & Data, Berlin, Germany
基金
欧洲研究理事会;
关键词
Multi-modal fusion; multi-label image classification; deep learning; transformer; remote sensing;
D O I
10.1109/IGARSS52108.2023.10281927
中图分类号
P [天文学、地球科学];
学科分类号
07 ;
摘要
In this paper, we introduce a novel Synchronized Class Token Fusion (SCT Fusion) architecture in the framework of multi-modal multi-label classification (MLC) of remote sensing (RS) images. The proposed architecture leverages modality-specific attention-based transformer encoders to process varying input modalities, while exchanging information across modalities by synchronizing the special class tokens after each transformer encoder block. The synchronization involves fusing the class tokens with a trainable fusion transformation, resulting in a synchronized class token that contains information from all modalities. As the fusion transformation is trainable, it allows to reach an accurate representation of the shared features among different modalities. Experimental results show the effectiveness of the proposed architecture over single-modality architectures and an early fusion multi-modal architecture when evaluated on a multi-modal MLC dataset. The code of the proposed architecture is publicly available at https://git.tu- berlin.de/rsim/sct- fusion.
引用
收藏
页码:4891 / 4894
页数:4
相关论文
共 50 条
  • [1] Transformer-based Label Set Generation for Multi-modal Multi-label Emotion Detection
    Ju, Xincheng
    Zhang, Dong
    Li, Junhui
    Zhou, Guodong
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 512 - 520
  • [2] Movie tag prediction: An extreme multi-label multi-modal transformer-based solution with explanation
    Guarascio, Massimo
    Minici, Marco
    Pisani, Francesco Sergio
    De Francesco, Erika
    Lambardi, Pasquale
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (04) : 1021 - 1043
  • [3] Collaboration based multi-modal multi-label learning
    Zhang, Yi
    Zhu, Yinlong
    Zhang, Zhecheng
    Wang, Chongjung
    [J]. APPLIED INTELLIGENCE, 2022, 52 (12) : 14204 - 14217
  • [4] Collaboration based multi-modal multi-label learning
    Yi Zhang
    Yinlong Zhu
    Zhecheng Zhang
    Chongjung Wang
    [J]. Applied Intelligence, 2022, 52 : 14204 - 14217
  • [5] A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification
    Song, Lingyun
    Liu, Jun
    Qian, Buyue
    Sun, Mingxuan
    Yang, Kuan
    Sun, Meng
    Abbas, Samar
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (12) : 6025 - 6038
  • [6] Multi-modal Contextual Prompt Learning for Multi-label Classification with Partial Labels
    Wang, Rui
    Pan, Zhengxin
    Wu, Fangyu
    Lv, Yifan
    Zhang, Bailing
    [J]. 2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 517 - 524
  • [7] Multi-Label Noise Robust Collaborative Learning for Remote Sensing Image Classification
    Aksoy, Ahmet Kerem
    Ravanbakhsh, Mahdyar
    Demir, Begum
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 6438 - 6451
  • [8] MULTI-LABEL CLASSIFICATION WITH SINGLE POSITIVE LABEL FOR REMOTE SENSING IMAGE
    Fujii, Keigo
    Iwasaki, Akira
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5870 - 5873
  • [9] Feature learning network with transformer for multi-label image classification
    Zhou, Wei
    Dou, Peng
    Su, Tao
    Hu, Haifeng
    Zheng, Zhijie
    [J]. PATTERN RECOGNITION, 2023, 136
  • [10] Rethinking Modal-oriented Label Correlations for Multi-modal Multi-label Learning
    Zhang, Yi
    Shen, Jundong
    Zhang, Zhecheng
    Zhang, Lei
    Wang, Chongjun
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,