Local relation network with multilevel attention for visual question answering

被引：10

作者：

Sun, Bo ^{[1
]}

Yao, Zeng ^{[1
]}

Zhang, Yinghui ^{[1
]}

Yu, Lejun ^{[1
]}

机构：

[1] Beijing Normal Univ, Intelligent Comp & Software Res Ctr, Sch Artificial Intelligence, Beijing, Peoples R China

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2020年 / 73卷 / 73期

关键词：

Visual question answering; Relation network; Attention mechanism;

D O I：

10.1016/j.jvcir.2020.102762

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the tremendous success of the visual question answering (VQA) tasks, visual attention mechanisms have become an indispensable part of VQA models. However, these attention-based methods do not consider any relationship among regions, which is crucial for the thorough understanding of the image by the model. We propose local relation networks for generating context-aware image features for each image region, which contain information on the relationship among the other image regions. Furthermore, we propose a multilevel attention mechanism to combine semantic information from the LRNs and the original image regions, rendering the decision of the model more reasonable. With these two measures, we improve the region representation and achieve better attentive effect and VQA performance. We conduct numerous experiments on the COCO-QA dataset and the largest VQA v2.0 benchmark dataset. Our model achieves competitive results, proving the effectiveness of our proposed LRNs and multilevel attention mechanism through visual demonstrations. (C) 2020 Published by Elsevier Inc.

引用

页数：9

共 50 条

[1] Relation-Aware Graph Attention Network for Visual Question Answering
Li, Linjie
Gan, Zhe
Cheng, Yu
Liu, Jingjing
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
[2] CRA-Net: Composed Relation Attention Network for Visual Question Answering
Peng, Liang
Yang, Yang
Wang, Zheng
Wu, Xiao
Huang, Zi
[J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1202 - 1210
[3] Collaborative Attention Network to Enhance Visual Question Answering
Gu, Rui
[J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
[4] Triple attention network for sentimental visual question answering
Ruwa, Nelson
Mao, Qirong
Song, Heping
Jia, Hongjie
Dong, Ming
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
[5] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
Gu, Geonmo
Kim, Seong Tae
Ro, Yong Man
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
[6] Fair Attention Network for Robust Visual Question Answering
Bi, Yandong
Jiang, Huajie
Hu, Yongli
Sun, Yanfeng
Yin, Baocai
[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (09) : 7870 - 7881
[7] Co-Attention Network With Question Type for Visual Question Answering
Yang, Chao
Jiang, Mengqi
Jiang, Bin
Zhou, Weixin
Li, Keqin
[J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
[8] Local self-attention in transformer for visual question answering
Shen, Xiang
Han, Dezhi
Guo, Zihan
Chen, Chongqing
Hua, Jie
Luo, Gaofeng
[J]. APPLIED INTELLIGENCE, 2023, 53 (13) : 16706 - 16723
[9] Local self-attention in transformer for visual question answering
Xiang Shen
Dezhi Han
Zihan Guo
Chongqing Chen
Jie Hua
Gaofeng Luo
[J]. Applied Intelligence, 2023, 53 : 16706 - 16723
[10] Multi-Attention Relation Network for Figure Question Answering
Li, Ying
Wu, Qingfeng
Chen, Bin
[J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 667 - 680

← 1 2 3 4 5 →