Research on Visual Question Answering Based on GAT Relational Reasoning

被引:0
|
作者
Yalin Miao
Wenfang Cheng
Shuyun He
Hui Jiang
机构
[1] Xi’an University of Technology,Department of Information Science
来源
Neural Processing Letters | 2022年 / 54卷
关键词
Visual question answering; Relational reasoning; Attention mechanism; Graph attention network; Multi-model feature fusion;
D O I
暂无
中图分类号
学科分类号
摘要
Due to the diversity of questions in VQA, it brings new challenges to the construction of VQA model. Existing VQA models focus on constructing a new attention mechanism, which makes the model increasingly complex. In addition, most of them concentrate on object recognition, ignore the research on spatial reasoning, semantic relations and even scene understanding. Therefore, a Graph Attention Network Relational Reasoning (GAT2R) model is proposed in this paper, which mainly includes scene graph generation and scene graph answer prediction. The scene map generation module mainly extracts the regional and spatial features of objects through the object detection model, and uses the relation decoder to predict the relations between object pairs. The scene graph answer prediction dynamically updates the node representation through the question-guided graph attention network, then performs multi-modal fusion with the question features, the answer is obtained finally. The experimental result shows that the accuracy of the proposed model is 54.45% on the natural scene dataset GQA, which is mainly based on relational reasoning. The experimental result is 68.04% on the widely used VQA2.0 dataset. Compared with the benchmark model, the accuracy of the proposed model is increased by 4.71% and 2.37% on GQA and VQA2.0 respectively, which proves the effectiveness and generalization of the model.
引用
收藏
页码:1435 / 1448
页数:13
相关论文
共 50 条
  • [1] Research on Visual Question Answering Based on GAT Relational Reasoning
    Miao, Yalin
    Cheng, Wenfang
    He, Shuyun
    Jiang, Hui
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
  • [2] Visual question answering method based on relational reasoning and gating mechanism
    Wang X.
    Chen Q.-H.
    Sun Q.
    Jia Y.-B.
    [J]. Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (01): : 36 - 46
  • [3] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [4] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [5] An effective spatial relational reasoning networks for visual question answering
    Shen, Xiang
    Han, Dezhi
    Chen, Chongqing
    Luo, Gaofeng
    Wu, Zhongdai
    [J]. PLOS ONE, 2022, 17 (11):
  • [6] Multimodal feature fusion by relational reasoning and attention for visual question answering
    Zhang, Weifeng
    Yu, Jing
    Hu, Hua
    Hu, Haiyang
    Qin, Zengchang
    [J]. INFORMATION FUSION, 2020, 55 (55) : 116 - 126
  • [7] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
  • [8] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    [J]. Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [9] HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering
    Liu, Fei
    Liu, Jing
    Wang, Weining
    Lu, Hanqing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1678 - 1687
  • [10] Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning
    Allahyari, Abbas
    Borna, Keivan
    [J]. 2021 52ND ANNUAL IRANIAN MATHEMATICS CONFERENCE (AIMC), 2021, : 74 - 76