Multimodal Graph Transformer for Multimodal Question Answering

被引:0
|
作者
He, Xuehai [1 ]
Wang, Xin Eric [1 ]
机构
[1] UC Santa Cruz, United States
关键词
Compilation and indexing terms; Copyright 2024 Elsevier Inc;
D O I
17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
中图分类号
学科分类号
摘要
Computational linguistics - Natural language processing systems - Semantics
引用
下载
收藏
页码:189 / 200
相关论文
共 50 条
  • [1] Multimodal Graph Transformer for Multimodal Question Answering
    He, Xuehai
    Wang, Xin Eric
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 189 - 200
  • [2] Multimodal Graph Transformer for Multimodal Question Answering
    He, Xuehai
    Wang, Xin Eric
    arXiv, 2023,
  • [3] Multimodal Graph Reasoning and Fusion for Video Question Answering
    Zhang, Shuai
    Wang, Xingfu
    Hawbani, Ammar
    Zhao, Liang
    Alsamhi, Saeed Hamood
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1410 - 1415
  • [4] Multimodal Graph Networks for Compositional Generalization in Visual Question Answering
    Saqur, Raeid
    Narasimhan, Karthik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] MIMOQA: Multimodal Input Multimodal Output Question Answering
    Singh, Hrituraj
    Nasery, Anshul
    Mehta, Denil
    Agarwal, Aishwarya
    Lamba, Jatin
    Srinivasan, Balaji Vasan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5317 - 5332
  • [6] Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer
    Peng, Min
    Wang, Chongyang
    Shi, Yu
    Zhou, Xiang-Dong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2038 - 2046
  • [7] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [8] MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
    Khan, Aisha Urooj
    Mazaheri, Amir
    Lobo, Niels Da Vitoria
    Shah, Mubarak
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4648 - 4660
  • [9] Multimodal deep fusion for image question answering
    Zhang, Weifeng
    Yu, Jing
    Wang, Yuxia
    Wang, Wei
    KNOWLEDGE-BASED SYSTEMS, 2021, 212
  • [10] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30