Learning Modality-Invariant Features by Cross-Modality Adversarial Network for Visual Question Answering

被引:1
|
作者
Fu, Ze [1 ,2 ]
Zheng, Changmeng [1 ,2 ]
Cai, Yi [1 ,2 ]
Li, Qing [3 ]
Wang, Tao [4 ]
机构
[1] South China Univ Technol, Sch Software Engn, Guangzhou, Peoples R China
[2] MOE China, Key Lab Big Data & Intelligent Robot SCUT, Shanghai, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[4] Kings Coll London, Inst Psychiat Psychol & Neurosci, Dept Biostat & Hlth Informat, London, England
来源
基金
中国国家自然科学基金;
关键词
Visual question answering; Domain adaptation; Modality-invariant co-learning;
D O I
10.1007/978-3-030-85896-4_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) is a typical multimodal task with significant development prospect on web application. In order to answer the question based on the corresponding image, a VQA model needs to utilize the information from different modality efficiently. Although the multimodal fusion methods such as attention mechanism make significant contribution for VQA, these methods try to co-learn the multimodal features directly, ignoring the large gap between different modality and thus poor aligning the semantic. In this paper, we propose a Cross-Modality Adversarial Network (CMAN) to address this limitation. Our method combines cross-modality adversarial learning with modality-invariant attention learning aiming to learn the modality-invariant features for better semantic alignment and higher answer prediction accuracy. The accuracy of model achieves 70.81% on the test-dev split on the VQA-v2 dataset. Our results also show that the model narrows the gap between different modalities effectively and improves the alignment performance of the multimodal information.
引用
收藏
页码:316 / 331
页数:16
相关论文
共 50 条
  • [41] Cross-modality collaborative learning identified pedestrian
    Xiongjun Wen
    Xin Feng
    Ping Li
    Wenfang Chen
    [J]. The Visual Computer, 2023, 39 : 4117 - 4132
  • [42] Representation Learning Through Cross-Modality Supervision
    Sankaran, Nishant
    Mohan, Deen Dayal
    Setlur, Srirangaraj
    Govindaraju, Venugopal
    Fedorishin, Dennis
    [J]. 2019 14TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2019), 2019, : 107 - 114
  • [43] Cross-Modality Retrieval by Joint Correlation Learning
    Wang, Shuo
    Guo, Dan
    Xu, Xin
    Zhuo, Li
    Wang, Meng
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (02)
  • [44] Learning Modality-Invariant Latent Representations for Generalized Zero-shot Learning
    Li, Jingjing
    Jing, Mengmeng
    Zhu, Lei
    Ding, Zhengming
    Lu, Ke
    Yang, Yang
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1348 - 1356
  • [45] Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA
    Vosoughi, Ali
    Deng, Shijian
    Zhang, Songyang
    Tian, Yapeng
    Xu, Chenliang
    Luo, Jiebo
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8609 - 8624
  • [46] IMPLICIT LEARNING - WITHIN-MODALITY AND CROSS-MODALITY TRANSFER OF TACIT KNOWLEDGE
    MANZA, L
    REBER, AS
    [J]. BULLETIN OF THE PSYCHONOMIC SOCIETY, 1991, 29 (06) : 499 - 499
  • [47] Mind the Gap: Learning Modality-Agnostic Representations With a Cross-Modality UNet
    Niu, Xin
    Li, Enyi
    Liu, Jinchao
    Wang, Yan
    Osadchy, Margarita
    Fang, Yongchun
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 655 - 670
  • [48] Cross-Modality Pyramid Alignment for Visual Intention Understanding
    Ye, Mang
    Shi, Qinghongya
    Su, Kehua
    Du, Bo
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2190 - 2201
  • [49] Cross-Modality Medical Image Retrieval with Deep Features
    Mbilinyi, Ashery
    Schuldt, Heiko
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 2632 - 2639
  • [50] Dual modality prompt learning for visual question-grounded answering in robotic surgery
    Zhang, Yue
    Fan, Wanshu
    Peng, Peixi
    Yang, Xin
    Zhou, Dongsheng
    Wei, Xiaopeng
    [J]. VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2024, 7 (01)