Latent Attention Network With Position Perception for Visual Question Answering

被引:0
|
作者
Zhang, Jing [1 ]
Liu, Xiaoqiang [1 ]
Wang, Zhe [1 ]
机构
[1] East China Univ Sci & Technol, Dept Comp Sci & Engn, Shanghai 200237, Peoples R China
基金
上海市自然科学基金;
关键词
Visualization; Semantics; Glass; Feature extraction; Cognition; Question answering (information retrieval); Task analysis; Gated counting module (GCM); latent attention (LA) network; latent attention generation module (LAGM); position-aware module (PAM); visual question answering (VQA); FUSION;
D O I
10.1109/TNNLS.2024.3377636
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For exploring the complex relative position relationships among multiobject with multiple position prepositions in the question, we propose a novel latent attention (LA) network for visual question answering (VQA), in which LA with position perception is extracted by a novel LA generation module (LAGM) and encoded along with absolute and relative position relations by our proposed position-aware module (PAM). The LAGM reconstructs original attention into LA by capturing the tendency of visual attention shifting according to the position prepositions in the question. The LA accurately captures the complex relative position features of multiple objects and helps the model locate the attention to the correct object or region. The PAM adopts latent state and relative position relations to enhance the capability of comprehending the multiobject correlations. In addition, we also propose a novel gated counting module (GCM) to strengthen the sensitivity of quantitative knowledge for effectively improving the performance of counting questions. Extensive experiments demonstrate that our proposed method achieves excellent performance on VQA and outperforms state-of-the-art methods on the widely used datasets VQA v2 and VQA v1.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [21] ARDN: Attention Re-distribution Network for Visual Question Answering
    Yi, Jinyang
    Han, Dezhi
    Chen, Chongqing
    Shen, Xiang
    Zong, Liang
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024,
  • [22] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [23] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin -Chen
    PATTERN RECOGNITION, 2022, 132
  • [24] GFSNet: Gaussian Fourier with sparse attention network for visual question answering
    Xiang Shen
    Dezhi Han
    Chin-Chen Chang
    Ammar Oad
    Huafeng Wu
    Artificial Intelligence Review, 58 (6)
  • [25] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [26] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    Multimedia Systems, 2023, 29 : 2527 - 2543
  • [27] Path-Wise Attention Memory Network for Visual Question Answering
    Xiang, Yingxin
    Zhang, Chengyuan
    Han, Zhichao
    Yu, Hao
    Li, Jiaye
    Zhu, Lei
    MATHEMATICS, 2022, 10 (18)
  • [28] MDAnet: Multiple Fusion Network with Double Attention for Visual Question Answering
    Feng, Junyi
    Gong, Ping
    Qiu, Guanghui
    ICVIP 2019: PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, 2019, : 143 - 147
  • [29] Question -Led object attention for visual question answering
    Gao, Lianli
    Cao, Liangfu
    Xu, Xing
    Shao, Jie
    Song, Jingkuan
    NEUROCOMPUTING, 2020, 391 : 227 - 233
  • [30] Question-Agnostic Attention for Visual Question Answering
    Farazi, Moshiur
    Khan, Salman
    Barnes, Nick
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3542 - 3549