Bi-direction Co-Attention Network on Visual Question Answering for Blind People

被引:0
|
作者
Tung Le [1 ]
Thong Bui [2 ,3 ]
Huy Tien Nguyen [2 ,3 ]
Minh Le Nguyen [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
[2] Univ Sci, Fac Informat Technol, Ho Chi Minh City, Vietnam
[3] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
关键词
Visual Question Answering; Visual Impairment; Bi-direction Co-Attention; Vision-language;
D O I
10.1117/12.2623596
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
The visual impairment community especially blind people needs support from advanced technologies to help them with understanding and answering the image content. In the multi-modal area, Visual Question Answering (VQA) is the notable cutting-edge task requiring the combination of images and texts via a co-attention mechanism. Inspired by the Deep Co-attention Layer, we propose a Bi-direction Co-Attention VT-Transformer network to jointly learn visual and textual features simultaneously. Via our system, the relationship and interaction of the modality objects are digested and combined together into the meaningful space. Besides, the consistency of Transformer architecture in both feature extractor and multi-modal attention function is efficient enough to decrease the layer of attention as well as the computation cost. Through the experimental results and ablation studies, our model achieves the promising performance against the existing approaches and uni-direction mechanism in VizWiz-VQA 2020 dataset for blind people.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Multimodal Bi-direction Guided Attention Networks for Visual Question Answering
    Cai, Linqin
    Xu, Nuoying
    Tian, Hang
    Chen, Kejia
    Fan, Haodu
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (09) : 11921 - 11943
  • [2] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    [J]. IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [3] Dynamic Co-attention Network for Visual Question Answering
    Ebaid, Doaa B.
    Madbouly, Magda M.
    El-Zoghabi, Adel A.
    [J]. 2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021), 2021, : 125 - 129
  • [4] Multimodal Bi-direction Guided Attention Networks for Visual Question Answering
    Linqin Cai
    Nuoying Xu
    Hang Tian
    Kejia Chen
    Haodu Fan
    [J]. Neural Processing Letters, 2023, 55 : 11921 - 11943
  • [5] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    [J]. Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [6] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [7] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    [J]. Multimedia Systems, 2023, 29 : 2527 - 2543
  • [8] Multi-Channel Co-Attention Network for Visual Question Answering
    Tian, Weidong
    He, Bin
    Wang, Nanxun
    Zhao, Zhongqiu
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [9] Hierarchical Question-Image Co-Attention for Visual Question Answering
    Lu, Jiasen
    Yang, Jianwei
    Batra, Dhruv
    Parikh, Devi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [10] Deep Modular Co-Attention Networks for Visual Question Answering
    Yu, Zhou
    Yu, Jun
    Cui, Yuhao
    Tao, Dacheng
    Tian, Qi
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6274 - 6283