Image captioning improved visual question answering

被引：0

作者：

Himanshu Sharma

Anand Singh Jalal

机构：

[1] GLA University Mathura,Department of Computer Engineering and Applications

来源：

Multimedia Tools and Applications | 2022年 / 81卷

关键词：

Visual question answering (VQA); Image captioning; Computer vision (CV); Natural language processing (NLP);

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Both Visual Question Answering (VQA) and image captioning are the problems which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In general, computer vision models are effectively utilized to represent visual contents. While NLP algorithms are used to represent the sentences. In recent years, VQA and image captioning tasks are tackled independently although they require similar type of algorithms. In this paper, a joint relationship between these two tasks is established and exploited. We present an image captioning based VQA model that uses the knowledge learnt from the image captioning task and transfers that knowledge to VQA task. We integrate the image captioning module into the VQA model by fusing the features obtained from captioning model and the attention-based visual feature. The experimental results demonstrate the improvement in the answer generation accuracy by a margin 3.45 % on VQA 1.0, 3.33% on VQA 2.0 and 1.73% on VQA-CP v2 datasets over the state-of-the-art VQA models.

引用

页码：34775 / 34796

页数：21

共 50 条

[1] Image captioning improved visual question answering
Sharma, Himanshu
Jalal, Anand Singh
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
[2] Auto-Parsing Network for Image Captioning and Visual Question Answering
Yang, Xu
Gao, Chongyang
Zhang, Hanwang
Cai, Jianfei
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2177 - 2187
[3] Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
Wu, Qi
Shen, Chunhua
Wang, Peng
Dick, Anthony
van den Hengel, Anton
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1367 - 1381
[4] Relation-Aware Image Captioning for Explainable Visual Question Answering
Tseng, Ching-Shan
Lin, Ying-Jia
Kao, Hung-Yu
[J]. 2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
[5] Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
Dong, Xuanyi
Zhu, Linchao
Zhang, De
Yang, Yi
Wu, Fei
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 54 - 62
[6] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[7] Relation-Aware Image Captioning with Hybrid-Attention for Explainable Visual Question Answering
Lin, Ying-Jia
Tseng, Ching-Shan
Kao, Hung-Yu
[J]. JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (03) : 649 - 659
[8] Image captioning for effective use of language models in knowledge-based visual question answering
Salaberria, Ander
Azkune, Gorka
Lacalle, Oier Lopez de
Soroa, Aitor
Agirre, Eneko
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
[9] Learning to enhance areal video captioning with visual question answering
Al Mehmadi, Shima M.
Bazi, Yakoub
Al Rahhal, Mohamad M.
Zuair, Mansour
[J]. INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (18) : 6395 - 6407
[10] An Improved Attention for Visual Question Answering
Rahman, Tanzila
Chou, Shih-Han
Sigal, Leonid
Carenini, Giuseppe
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662

← 1 2 3 4 5 →