Local relation network with multilevel attention for visual question answering

被引:10
|
作者
Sun, Bo [1 ]
Yao, Zeng [1 ]
Zhang, Yinghui [1 ]
Yu, Lejun [1 ]
机构
[1] Beijing Normal Univ, Intelligent Comp & Software Res Ctr, Sch Artificial Intelligence, Beijing, Peoples R China
关键词
Visual question answering; Relation network; Attention mechanism;
D O I
10.1016/j.jvcir.2020.102762
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the tremendous success of the visual question answering (VQA) tasks, visual attention mechanisms have become an indispensable part of VQA models. However, these attention-based methods do not consider any relationship among regions, which is crucial for the thorough understanding of the image by the model. We propose local relation networks for generating context-aware image features for each image region, which contain information on the relationship among the other image regions. Furthermore, we propose a multilevel attention mechanism to combine semantic information from the LRNs and the original image regions, rendering the decision of the model more reasonable. With these two measures, we improve the region representation and achieve better attentive effect and VQA performance. We conduct numerous experiments on the COCO-QA dataset and the largest VQA v2.0 benchmark dataset. Our model achieves competitive results, proving the effectiveness of our proposed LRNs and multilevel attention mechanism through visual demonstrations. (C) 2020 Published by Elsevier Inc.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [2] CRA-Net: Composed Relation Attention Network for Visual Question Answering
    Peng, Liang
    Yang, Yang
    Wang, Zheng
    Wu, Xiao
    Huang, Zi
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1202 - 1210
  • [3] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [4] Triple attention network for sentimental visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Song, Heping
    Jia, Hongjie
    Dong, Ming
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
  • [5] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
  • [6] Fair Attention Network for Robust Visual Question Answering
    Bi, Yandong
    Jiang, Huajie
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    [J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (09) : 7870 - 7881
  • [7] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [8] Local self-attention in transformer for visual question answering
    Shen, Xiang
    Han, Dezhi
    Guo, Zihan
    Chen, Chongqing
    Hua, Jie
    Luo, Gaofeng
    [J]. APPLIED INTELLIGENCE, 2023, 53 (13) : 16706 - 16723
  • [9] Local self-attention in transformer for visual question answering
    Xiang Shen
    Dezhi Han
    Zihan Guo
    Chongqing Chen
    Jie Hua
    Gaofeng Luo
    [J]. Applied Intelligence, 2023, 53 : 16706 - 16723
  • [10] Multi-Attention Relation Network for Figure Question Answering
    Li, Ying
    Wu, Qingfeng
    Chen, Bin
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 667 - 680