Multimodal Graph Transformer for Multimodal Question Answering

被引:0
|
作者
He, Xuehai [1 ]
Wang, Xin Eric [1 ]
机构
[1] UC Santa Cruz, United States
来源
arXiv | 2023年
关键词
Compendex;
D O I
暂无
中图分类号
学科分类号
摘要
Semantics
引用
收藏
相关论文
共 50 条
  • [21] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [22] Health-Oriented Multimodal Food Question Answering
    Wang, Jianghai
    Hu, Menghao
    Song, Yaguang
    Yang, Xiaoshan
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 191 - 203
  • [23] Dealing with spoken requests in a multimodal Question Answering system
    Gretter, Roberto
    Kouylekov, Milen
    Negri, Matteo
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, 2008, 5253 : 93 - 102
  • [24] QAlayout: Question Answering Layout Based on Multimodal Attention for Visual Question Answering on Corporate Document
    Mahamoud, Ibrahim Souleiman
    Coustaty, Mickael
    Joseph, Aurelie
    d'Andecy, Vincent Poulain
    Ogier, Jean-Marc
    DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 659 - 673
  • [25] VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering
    Wang, Yanan
    Yasunaga, Michihiro
    Ren, Hongyu
    Wada, Shinya
    Leskovec, Jure
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21525 - 21535
  • [26] Multimodal Encoders and Decoders with Gate Attention for Visual Question Answering
    Li, Haiyan
    Han, Dezhi
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (03) : 1023 - 1040
  • [27] Multimodal fusion: advancing medical visual question-answering
    Mudgal, Anjali
    Kush, Udbhav
    Kumar, Aditya
    Jafari, Amir
    Neural Computing and Applications, 2024, 36 (33) : 20949 - 20962
  • [28] Multimodal Question Answering over Structured Data with Ambiguous Entities
    Li, Huadong
    Wang, Yafang
    de Melo, Gerard
    Tu, Changhe
    Chen, Baoquan
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 79 - 88
  • [29] Multimodal Local Perception Bilinear Pooling for Visual Question Answering
    Lao, Mingrui
    Guo, Yanming
    Wang, Hui
    Zhang, Xin
    IEEE ACCESS, 2018, 6 : 57923 - 57932
  • [30] Dual-Key Multimodal Backdoors for Visual Question Answering
    Walmer, Matthew
    Sikka, Karan
    Sur, Indranil
    Shrivastava, Abhinav
    Jha, Susmit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15354 - 15364