Multimodal Graph Transformer for Multimodal Question Answering

被引:0
|
作者
He, Xuehai [1 ]
Wang, Xin Eric [1 ]
机构
[1] UC Santa Cruz, United States
来源
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference | 2023年
关键词
Compilation and indexing terms; Copyright 2024 Elsevier Inc;
D O I
17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023
中图分类号
学科分类号
摘要
Computational linguistics - Natural language processing systems - Semantics
引用
收藏
页码:189 / 200
相关论文
共 50 条
  • [41] Multimodal attention-driven visual question answering for Malayalam
    Kovath A.G.
    Nayyar A.
    Sikha O.K.
    Neural Computing and Applications, 2024, 36 (24) : 14691 - 14708
  • [42] Contrastive training of a multimodal encoder for medical visual question answering
    Silva, Joao Daniel
    Martins, Bruno
    Magalhaes, Joao
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2023, 18
  • [43] Visual Question Answering based on multimodal triplet knowledge accumuation
    Wang, Fengjuan
    An, Gaoyun
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 81 - 84
  • [44] Multimodal Dual Attention Memory for Video Story Question Answering
    Kim, Kyung-Min
    Choi, Seong-Ho
    Kim, Jin-Hwa
    Zhang, Byoung-Tak
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 698 - 713
  • [45] Dual-Key Multimodal Backdoors for Visual Question Answering
    Walmer, Matthew
    Sikka, Karan
    Sur, Indranil
    Shrivastava, Abhinav
    Jha, Susmit
    arXiv, 2021,
  • [46] Improving Visual Question Answering by Multimodal Gate Fusion Network
    Xiang, Shenxiang
    Chen, Qiaohong
    Fang, Xian
    Guo, Menghao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [47] Speech Grammars for Textual Entailment Patterns in Multimodal Question Answering
    Sonntag, Daniel
    Sacaleanu, Bogdan
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3554 - 3558
  • [48] Hierarchical Conditional Relation Networks for Multimodal Video Question Answering
    Thao Minh Le
    Vuong Le
    Svetha Venkatesh
    Truyen Tran
    International Journal of Computer Vision, 2021, 129 : 3027 - 3050
  • [49] FTN-VQA: MULTIMODAL REASONING BY LEVERAGING A FULLY TRANSFORMER-BASED NETWORK FOR VISUAL QUESTION ANSWERING
    Wang, Runmin
    Xu, Weixiang
    Zhu, Yanbin
    Zhu, Zhenlin
    Chen, Hua
    Ding, Yajun
    Liu, Jinping
    Gao, Changxin
    Sang, Nong
    FRACTALS-COMPLEX GEOMETRY PATTERNS AND SCALING IN NATURE AND SOCIETY, 2023, 31 (06)
  • [50] Contrastive Video Question Answering via Video Graph Transformer
    Xiao, Junbin
    Zhou, Pan
    Yao, Angela
    Li, Yicong
    Hong, Richang
    Yan, Shuicheng
    Chua, Tat-Seng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280