Question -Led object attention for visual question answering

被引:19
|
作者
Gao, Lianli [1 ,2 ]
Cao, Liangfu [1 ,2 ]
Xu, Xing [1 ,2 ]
Shao, Jie [1 ,2 ]
Song, Jingkuan [1 ,2 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Future Media, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Comp Sci, Chengdu, Peoples R China
基金
中国国家自然科学基金;
关键词
Object attention; Question led; Visual question answering;
D O I
10.1016/j.neucom.2018.11.102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question plays a leading role for Visual Question Answering (VQA) because it specifies the particular visual objects or conjures vivid visual that the machine should attend. However, existing approaches predominantly predict the answer by utilizing the question and the whole image without considering the leading role of the question. Also, recent object spatial inference is usually conducted on pixel level instead of object level. Therefore, we propose a novel but simple framework, namely Question-Led Object Attention (QLOB), to improve the VQA performance by exploring question semantics, fine-grained object information, and the relationship between those two modalities. First, we extract sentence semantics by a question model, and utilize the efficient object detection network to obtain a global visual feature and local features from top r object region proposals. Second, our QLOB attention mechanism selects those question-related object regions. Third, we optimize question model and QLOB attention by a softmax classifier to predict the final answer. Extensive experimental results on three public VQA datasets demonstrate that our QLOB outperforms the state-of-the-arts. © 2019 Elsevier B.V.
引用
收藏
页码:227 / 233
页数:7
相关论文
共 50 条
  • [1] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549
  • [2] Question Type Guided Attention in Visual Question Answering
    Shi, Yang
    Furlanello, Tommaso
    Zha, Sheng
    Anandkumar, Animashree
    [J]. COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
  • [3] Object-Difference Attention: A Simple Relational Attention for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 519 - 527
  • [4] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [5] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [6] Differential Attention for Visual Question Answering
    Patro, Badri
    Namboodiri, Vinay P.
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7680 - 7688
  • [7] Fusing Attention with Visual Question Answering
    Burt, Ryan
    Cudic, Mihael
    Principe, Jose C.
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 949 - 953
  • [8] Visual Question Answering using Explicit Visual Attention
    Lioutas, Vasileios
    Passalis, Nikolaos
    Tefas, Anastasios
    [J]. 2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [9] Guiding Visual Question Answering with Attention Priors
    Le, Thao Minh
    Le, Vuong
    Gupta, Sunil
    Venkatesh, Svetha
    Tran, Truyen
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4370 - 4379
  • [10] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Yang, Jufeng
    Yuan, Xiaojie
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743