Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning

被引:0
|
作者
Allahyari, Abbas [1 ]
Borna, Keivan [1 ]
机构
[1] Kharazmi Univ, Dept Comp Sci, Tehran, Iran
关键词
Visual Question Answering; Visual Reasoning; Multi-modal Learning; Deep Learning;
D O I
10.1109/AIMC54250.2021.9656978
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Visual question answering (VQA) is a new interdisciplinary research area, in which a model attempts to answer a free-form question in natural language based on a given image. In this paper we improve an existing model, which leverages relational networks and attention mechanism, by decreasing its train and inference time while preserving a comparable accuracy.
引用
收藏
页码:74 / 76
页数:3
相关论文
共 50 条
  • [1] Multimodal feature fusion by relational reasoning and attention for visual question answering
    Zhang, Weifeng
    Yu, Jing
    Hu, Hua
    Hu, Haiyang
    Qin, Zengchang
    [J]. INFORMATION FUSION, 2020, 55 : 116 - 126
  • [2] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [3] ViCLEVR: a visual reasoning dataset and hybrid multimodal fusion model for visual question answering in Vietnamese
    Tran, Khiem Vinh
    Phan, Hao Phu
    Van Nguyen, Kiet
    Nguyen, Ngan Luu Thuy
    [J]. MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [4] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [5] Multimodal Learning and Reasoning for Visual Question Answering
    Ilievski, Ilija
    Feng, Jiashi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [6] OpenViVQA: Task, dataset, and multimodal fusion models for visual question answering in Vietnamese
    Nguyen, Nghia Hieu
    Vo, Duong T. D.
    Nguyen, Kiet Van
    Nguyen, Ngan Luu-Thuy
    [J]. INFORMATION FUSION, 2023, 100
  • [7] DMRFNet: Deep Multimodal Reasoning and Fusion for Visual Question Answering and explanation generation
    Zhang, Weifeng
    Yu, Jing
    Zhao, Wenhong
    Ran, Chuan
    [J]. INFORMATION FUSION, 2021, 72 : 70 - 79
  • [8] Multimodal Graph Reasoning and Fusion for Video Question Answering
    Zhang, Shuai
    Wang, Xingfu
    Hawbani, Ammar
    Zhao, Liang
    Alsamhi, Saeed Hamood
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1410 - 1415
  • [9] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    [J]. 2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
  • [10] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering
    Gao, Jingying
    Wu, Qi
    Blair, Alan
    Pagnucco, Maurice
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,