Multi-scale Relational Reasoning with Regional Attention for Visual Question Answering

被引:1
|
作者
Ma, Yuntao [1 ]
Lu, Tong [1 ]
Wu, Yirui [2 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
[2] Univ Nanjing, Coll Comp & Informat Hohai, Nanjing, Peoples R China
关键词
Visual question learning; Attention; Multi-scale relational reasoning;
D O I
10.1109/ICPR48806.2021.9413140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the main challenges of visual question answering (VQA) lies in properly reasoning relations among visual regions involved in the question. In this paper, we propose a novel neural network to perform question-guided relational reasoning in multi-scales for visual question answering, in which each region of image is enhanced by regional attention. Specifically, we present regional attention module, which consists of a soft attention module and a hard attention module, to select informative regions of the image according to informative evaluations implemented by question-guided soft attention. Combinations of different informative regions are then concatenated with question embedding in different scales to capture relational information. Relational reasoning module can extract question-based relational information among regions, in which multi-scale mechanism gives it the ability to model scaled relationships with diversity making it sensitive to numbers. We conduct experiments to show that our proposed architecture is effective and achieves a new state-of-the-art on VQA v2.
引用
收藏
页码:5642 / 5649
页数:8
相关论文
共 50 条
  • [1] Multi-scale relation reasoning for multi-modal Visual Question Answering
    Wu, Yirui
    Ma, Yuntao
    Wan, Shaohua
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 96
  • [2] Multimodal feature fusion by relational reasoning and attention for visual question answering
    Zhang, Weifeng
    Yu, Jing
    Hu, Hua
    Hu, Haiyang
    Qin, Zengchang
    [J]. INFORMATION FUSION, 2020, 55 : 116 - 126
  • [3] A multi-scale contextual attention network for remote sensing visual question answering
    Feng, Jiangfan
    Wang, Hui
    [J]. INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2024, 126
  • [4] Multi-Scale Progressive Attention Network for Video Question Answering
    Guo, Zhicheng
    Zhao, Jiaxuan
    Jiao, Licheng
    Liu, Xu
    Li, Lingling
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 973 - 978
  • [5] Relational reasoning and adaptive fusion for visual question answering
    Shen, Xiang
    Han, Dezhi
    Zong, Liang
    Guo, Zihan
    Hua, Jie
    [J]. APPLIED INTELLIGENCE, 2024, 54 (06) : 5062 - 5080
  • [6] MUREL: Multimodal Relational Reasoning for Visual Question Answering
    Cadene, Remi
    Ben-younes, Hedi
    Cord, Matthieu
    Thome, Nicolas
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
  • [7] Multi-modal spatial relational attention networks for visual question answering
    Yao, Haibo
    Wang, Lipeng
    Cai, Chengtao
    Sun, Yuxin
    Zhang, Zhi
    Luo, Yongkang
    [J]. IMAGE AND VISION COMPUTING, 2023, 140
  • [8] Efficient Multi-step Reasoning Attention Network for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    Zhang, Meng
    [J]. THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [9] An effective spatial relational reasoning networks for visual question answering
    Shen, Xiang
    Han, Dezhi
    Chen, Chongqing
    Luo, Gaofeng
    Wu, Zhongdai
    [J]. PLOS ONE, 2022, 17 (11):
  • [10] Research on Visual Question Answering Based on GAT Relational Reasoning
    Miao, Yalin
    Cheng, Wenfang
    He, Shuyun
    Jiang, Hui
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448