High-Order Attention Models for Visual Question Answering

被引:0
|
作者
Schwartz, Idan [1 ]
Schwing, Alexander G. [2 ]
Hazan, Tamir [3 ]
机构
[1] Technion, Dept Comp Sci, Haifa, Israel
[2] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL USA
[3] Technion, Dept Ind Engn & Management, Haifa, Israel
基金
以色列科学基金会; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering
    Yu, Zhou
    Yu, Jun
    Xiang, Chenchao
    Fan, Jianping
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5947 - 5959
  • [2] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [3] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [4] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [5] Fusing Attention with Visual Question Answering
    Burt, Ryan
    Cudic, Mihael
    Principe, Jose C.
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
  • [6] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    [J]. NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [7] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [8] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [9] Visual Question Answering using Explicit Visual Attention
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [10] Guiding Visual Question Answering with Attention Priors
    Le, Thao Minh
    Le, Vuong
    Gupta, Sunil
    Venkatesh, Svetha
    Tran, Truyen
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379