Question Type Guided Attention in Visual Question Answering

被引:25
|
作者
Shi, Yang [1 ]
Furlanello, Tommaso [2 ]
Zha, Sheng [3 ]
Anandkumar, Animashree [3 ,4 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
[2] Univ Southern Calif, Los Angeles, CA 90007 USA
[3] Amazon AI, Seattle, WA USA
[4] CALTECH, Pasadena, CA 91125 USA
来源
关键词
Visual question answering; Attention; Question type; Feature selection; Multi-task;
D O I
10.1007/978-3-030-01225-0_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) requires integration of feature maps with drastically different structures. Image descriptors have structures at multiple spatial scales, while lexical inputs inherently follow a temporal sequence and naturally cluster into semantically different question types. A lot of previous works use complex models to extract feature representations but neglect to use high-level information summary such as question types in learning. In this work, we propose Question Type-guided Attention (QTA). It utilizes the information of question type to dynamically balance between bottom-up and top-down visual features, respectively extracted from ResNet and Faster R-CNN networks. We experiment with multiple VQA architectures with extensive input ablation studies over the TDIUC dataset and show that QTA systematically improves the performance by more than 5% across multiple question type categories such as "Activity Recognition", "Utility" and "Counting" on TDIUC dataset compared to the state-of-art. By adding QTA on the state-of-art model MCB, we achieve 3% improvement in overall accuracy. Finally, we propose a multi-task extension to predict question types which generalizes QTA to applications that lack question type, with a minimal performance loss.
引用
收藏
页码:158 / 175
页数:18
相关论文
共 50 条
  • [31] Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering
    Wang, Zhe
    Liu, Xiaoyi
    Chen, Liangjian
    Wang, Limin
    Qiao, Yu
    Xie, Xiaohui
    Fowlkes, Charless
    [J]. 2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1888 - 1896
  • [32] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
    Liu, Yuhang
    Wei, Wei
    Peng, Daowan
    Mao, Xian-Ling
    He, Zhiyong
    Zhou, Pan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357
  • [33] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [34] Densely Connected Attention Flow for Visual Question Answering
    Liu, Fei
    Liu, Jing
    Fang, Zhiwei
    Hong, Richang
    Lu, Hanging
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 869 - 875
  • [35] Triple attention network for sentimental visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Song, Heping
    Jia, Hongjie
    Dong, Ming
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
  • [36] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
  • [37] Adversarial Learning with Bidirectional Attention for Visual Question Answering
    Li, Qifeng
    Tang, Xinyi
    Jian, Yi
    [J]. SENSORS, 2021, 21 (21)
  • [38] Learning Visual Question Answering by Bootstrapping Hard Attention
    Malinowski, Mateusz
    Doersch, Carl
    Santoro, Adam
    Battaglia, Peter
    [J]. COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 3 - 20
  • [39] From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
    Song, Jingkuan
    Zeng, Pengpeng
    Gao, Lianli
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 906 - 912
  • [40] Question-Guided Erasing-Based Spatiotemporal Attention Learning for Video Question Answering
    Liu, Fei
    Liu, Jing
    Hong, Richang
    Lu, Hanqing
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1367 - 1379