High-Order Attention Models for Visual Question Answering

被引：0

作者：

Schwartz, Idan ^{[1
]}

Schwing, Alexander G. ^{[2
]}

Hazan, Tamir ^{[3
]}

机构：

[1] Technion, Dept Comp Sci, Haifa, Israel

[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL USA

[3] Technion, Dept Ind Engn & Management, Haifa, Israel

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) | 2017年 / 30卷

基金：

以色列科学基金会; 美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

引用

页数：11

共 50 条

[1] Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering
Yu, Zhou
Yu, Jun
Xiang, Chenchao
Fan, Jianping
Tao, Dacheng
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5947 - 5959
[2] An Improved Attention for Visual Question Answering
Rahman, Tanzila
Chou, Shih-Han
Sigal, Leonid
Carenini, Giuseppe
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
[3] Differential Attention for Visual Question Answering
Patro, Badri
Namboodiri, Vinay P.
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
[4] Multimodal Attention for Visual Question Answering
Kodra, Lorena
Mece, Elinda Kajo
[J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
[5] Fusing Attention with Visual Question Answering
Burt, Ryan
Cudic, Mihael
Principe, Jose C.
[J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
[6] Question -Led object attention for visual question answering
Gao, Lianli
Cao, Liangfu
Xu, Xing
Shao, Jie
Song, Jingkuan
[J]. NEUROCOMPUTING, 2020, 391 : 227 - 233
[7] Question-Agnostic Attention for Visual Question Answering
Farazi, Moshiur
Khan, Salman
Barnes, Nick
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
[8] Question Type Guided Attention in Visual Question Answering
Shi, Yang
Furlanello, Tommaso
Zha, Sheng
Anandkumar, Animashree
[J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
[9] Visual Question Answering using Explicit Visual Attention
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
[J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[10] Guiding Visual Question Answering with Attention Priors
Le, Thao Minh
Le, Vuong
Gupta, Sunil
Venkatesh, Svetha
Tran, Truyen
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379

← 1 2 3 4 5 →