iVQA: Inverse Visual Question Answering

被引：24

作者：

Liu, Feng ^{[1
]}

Xiang, Tao ^{[2
]}

Hospedales, Timothy M. ^{[3
]}

Yang, Wankou ^{[1
]}

Sun, Changyin ^{[1
]}

机构：

[1] Southeast Univ, Nanjing, Jiangsu, Peoples R China

[2] Queen Mary Univ London, London, England

[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland

来源：

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年

关键词：

D O I：

10.1109/CVPR.2018.00898

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose the inverse problem of Visual question answering (iVQA), and explore its suitability as a benchmark for visuo-linguistic understanding. The iVQA task is to generate a question that corresponds to a given image and answer pair. Since the answers are less informative than the questions, and the questions have less learnable bias, an iVQA model needs to better understand the image to be successful than a VQA model. We pose question generation as a multi-modal dynamic inference process and propose an iVQA model that can gradually adjust its focus of attention guided by both a partially generated question and the answer. For evaluation, apart from existing linguistic metrics, we propose a new ranking metric. This metric compares the ground truth question's rank among a list of distractors, which allows the drawbacks of different algorithms and sources of error to be studied. Experimental results show that our model can generate diverse, grammatically correct and content correlated questions that match the given answer

引用

页码：8611 / 8619

页数：9

共 50 条

[1] Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool
Liu, Feng
Xiang, Tao
Hospedales, Timothy M.
Yang, Wankou
Sun, Changyin
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) : 460 - 474
[2] Question Modifiers in Visual Question Answering
Britton, William
Sarkhel, Somdeb
Venugopal, Deepak
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1472 - 1479
[3] Multimodal Inverse Cloze Task for Knowledge-Based Visual Question Answering
Lerner, Paul
Ferret, Olivier
Guinaudeau, Camille
[J]. ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT I, 2023, 13980 : 569 - 587
[4] VQA: Visual Question Answering
Antol, Stanislaw
Agrawal, Aishwarya
Lu, Jiasen
Mitchell, Margaret
Batra, Dhruv
Zitnick, C. Lawrence
Parikh, Devi
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
[5] VQA: Visual Question Answering
Agrawal, Aishwarya
Lu, Jiasen
Antol, Stanislaw
Mitchell, Margaret
Zitnick, C. Lawrence
Parikh, Devi
Batra, Dhruv
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
[6] Visual Question Answering A tutorial
Teney, Damien
Wu, Qi
van den Hengel, Anton
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
[7] Visual Question Generation as Dual Task of Visual Question Answering
Li, Yikang
Duan, Nan
Zhou, Bolei
Chu, Xiao
Ouyang, Wanli
Wang, Xiaogang
Zhou, Ming
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
[8] Sequential Visual Reasoning for Visual Question Answering
Liu, Jinlai
Wu, Chenfei
Wang, Xiaojie
Dong, Xuan
[J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
[9] Robust Explanations for Visual Question Answering
Patro, Badri N.
Patel, Shivansh
Namboodiri, Vinay P.
[J]. 2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575
[10] An Improved Attention for Visual Question Answering
Rahman, Tanzila
Chou, Shih-Han
Sigal, Leonid
Carenini, Giuseppe
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662

← 1 2 3 4 5 →