An Answer FeedBack Network for Visual Question Answering

被引:0
|
作者
Tian, Weidong [1 ]
Tian, Ruihua [1 ]
Zhao, Zhongqiu [1 ]
Ren, Quan [1 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/IJCNN54540.2023.10191079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances have explored the power of transformer architecture in Visual Question Answering(VQA). However, most of the models suffer from misalignment of multimodal features, and they focus on unimportant image regions when answering the given questions. To address this, in this paper, we propose an Answer FeedBack Network (AFBN) to focus on image region features that are more beneficial for answering questions. The generate answers of the backbone network are again inputted into the network as feedback information. Then, we propose a FeedBack module (FB) to control the answer feedback. Additionally, we adopt the consistency loss function to reconstruct the image region features. By this function, the model can ensure the same of the image region features related to the question or answer. Extensive experiments on VQA-v2 benchmark dataset show that our method achieves better performance than the state-of-the-art methods.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Recurrent convolutional neural network for answer selection in community question answering
    Zhou, Xiaoqiang
    Hu, Baotian
    Chen, Qingcai
    Wang, Xiaolong
    [J]. NEUROCOMPUTING, 2018, 274 : 8 - 18
  • [22] Scene Graph Refinement Network for Visual Question Answering
    Qian, Tianwen
    Chen, Jingjing
    Chen, Shaoxiang
    Wu, Bo
    Jiang, Yu-Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
  • [23] Transformer-Based Neural Network for Answer Selection in Question Answering
    Shao, Taihua
    Guo, Yupu
    Chen, Honghui
    Hao, Zepeng
    [J]. IEEE ACCESS, 2019, 7 : 26146 - 26156
  • [24] Locate Before Answering: Answer Guided Question Localization for Video Question Answering
    Qian, Tianwen
    Cui, Ran
    Chen, Jingjing
    Peng, Pai
    Guo, Xiaowei
    Jiang, Yu-Gang
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4554 - 4563
  • [25] Visual-Semantic Dual Channel Network for Visual Question Answering
    Wang, Xin
    Chen, Qiaohong
    Hu, Ting
    Sun, Qi
    Jia, Yubo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] Visual-Textual Semantic Alignment Network for Visual Question Answering
    Tian, Weidong
    Zhang, Yuzheng
    He, Bin
    Zhu, Junjun
    Zhao, Zhongqiu
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 259 - 270
  • [27] Answer formulation for question-answering
    Kosseim, L
    Plamondon, L
    Guillemette, LJ
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2003, 2671 : 24 - 34
  • [28] Question Answering Based on Answer Trustworthiness
    Oh, Hyo-Jung
    Lee, Chung-Hee
    Yoon, Yeo-Chan
    Jang, Myung-Gil
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 310 - 317
  • [29] Transformer-based Sparse Encoder and Answer Decoder for Visual Question Answering
    Peng, Longkun
    An, Gaoyun
    Ruan, Qiuqi
    [J]. 2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 120 - 123
  • [30] Answer-Based Entity Extraction and Alignment for Visual Text Question Answering
    Yu, Jun
    Jing, Mohan
    Liu, Weihao
    Luo, Tongxu
    Zhang, Bingyuan
    Lu, Keda
    Lei, Fangyu
    Sun, Jianqing
    Liang, Jiaen
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9487 - 9491