Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

被引：195

作者：

Das, Abhishek ^{[1
]}

Agrawal, Harsh ^{[2
]}

Zitnick, Larry ^{[3
]}

Parikh, Devi ^{[1
,3
]}

Batra, Dhruv ^{[1
,3
]}

机构：

[1] Georgia Inst Technol, Atlanta, GA 30332 USA

[2] Virginia Tech, Blacksburg, VA 24061 USA

[3] Facebook AI Res, Menlo Pk, CA USA

来源：

COMPUTER VISION AND IMAGE UNDERSTANDING | 2017年 / 163卷

基金：

美国国家科学基金会;

关键词：

Visual Question Answering; Attention;

D O I：

10.1016/j.cviu.2017.10.001

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We conduct large-scale studies on 'human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Our experiments show that current attention models in VQA do not seem to be looking at the same regions as humans. Finally, we train VQA models with explicit attention supervision, and find that it improves VQA performance.

引用

页码：90 / 100

页数：11

共 50 条

[21] Question -Led object attention for visual question answering
Gao, Lianli
Cao, Liangfu
Xu, Xing
Shao, Jie
Song, Jingkuan
NEUROCOMPUTING, 2020, 391 : 227 - 233
[22] Question-Agnostic Attention for Visual Question Answering
Farazi, Moshiur
Khan, Salman
Barnes, Nick
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
[23] Question Type Guided Attention in Visual Question Answering
Shi, Yang
Furlanello, Tommaso
Zha, Sheng
Anandkumar, Animashree
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
[24] Question Answering with Hierarchical Attention Networks
Alpay, Tayfun
Heinrich, Stefan
Nelskamp, Michael
Wermter, Stefan
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[25] Visual Question Answering using Explicit Visual Attention
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[26] Exploring Human-Like Attention Supervision in Visual Question Answering
Qiao, Tingting
Dong, Jianfeng
Xu, Duanqing
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 7300 - 7307
[27] Guiding Visual Question Answering with Attention Priors
Le, Thao Minh
Le, Vuong
Gupta, Sunil
Venkatesh, Svetha
Tran, Truyen
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379
[28] Re-Attention for Visual Question Answering
Guo, Wenya
Zhang, Ying
Yang, Jufeng
Yuan, Xiaojie
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
[29] Re-Attention for Visual Question Answering
Guo, Wenya
Zhang, Ying
Wu, Xiaoping
Yang, Jufeng
Cai, Xiangrui
Yuan, Xiaojie
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 91 - 98
[30] Feature Enhancement in Attention for Visual Question Answering
Lin, Yuetan
Pang, Zhangyang
Wang, Donghui
Zhuang, Yueting
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4216 - 4222

← 1 2 3 4 5 →