Image captioning improved visual question answering

被引:0
|
作者
Himanshu Sharma
Anand Singh Jalal
机构
[1] GLA University Mathura,Department of Computer Engineering and Applications
来源
关键词
Visual question answering (VQA); Image captioning; Computer vision (CV); Natural language processing (NLP);
D O I
暂无
中图分类号
学科分类号
摘要
Both Visual Question Answering (VQA) and image captioning are the problems which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In general, computer vision models are effectively utilized to represent visual contents. While NLP algorithms are used to represent the sentences. In recent years, VQA and image captioning tasks are tackled independently although they require similar type of algorithms. In this paper, a joint relationship between these two tasks is established and exploited. We present an image captioning based VQA model that uses the knowledge learnt from the image captioning task and transfers that knowledge to VQA task. We integrate the image captioning module into the VQA model by fusing the features obtained from captioning model and the attention-based visual feature. The experimental results demonstrate the improvement in the answer generation accuracy by a margin 3.45 % on VQA 1.0, 3.33% on VQA 2.0 and 1.73% on VQA-CP v2 datasets over the state-of-the-art VQA models.
引用
收藏
页码:34775 / 34796
页数:21
相关论文
共 50 条
  • [1] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [2] Auto-Parsing Network for Image Captioning and Visual Question Answering
    Yang, Xu
    Gao, Chongyang
    Zhang, Hanwang
    Cai, Jianfei
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2177 - 2187
  • [3] Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
    Wu, Qi
    Shen, Chunhua
    Wang, Peng
    Dick, Anthony
    van den Hengel, Anton
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1367 - 1381
  • [4] Relation-Aware Image Captioning for Explainable Visual Question Answering
    Tseng, Ching-Shan
    Lin, Ying-Jia
    Kao, Hung-Yu
    [J]. 2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
  • [5] Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
    Dong, Xuanyi
    Zhu, Linchao
    Zhang, De
    Yang, Yi
    Wu, Fei
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 54 - 62
  • [6] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [7] Relation-Aware Image Captioning with Hybrid-Attention for Explainable Visual Question Answering
    Lin, Ying-Jia
    Tseng, Ching-Shan
    Kao, Hung-Yu
    [J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (03) : 649 - 659
  • [8] Image captioning for effective use of language models in knowledge-based visual question answering
    Salaberria, Ander
    Azkune, Gorka
    Lacalle, Oier Lopez de
    Soroa, Aitor
    Agirre, Eneko
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [9] Learning to enhance areal video captioning with visual question answering
    Al Mehmadi, Shima M.
    Bazi, Yakoub
    Al Rahhal, Mohamad M.
    Zuair, Mansour
    [J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6395 - 6407
  • [10] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662