Deconfounded Visual Question Generation with Causal Inference

被引：1

作者：

Chen, Jiali ^{[1
]}

Guo, Zhenjun ^{[1
]}

Xie, Jiayuan ^{[1
]}

Cai, Yi ^{[1
]}

Li, Qing ^{[2
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金;

关键词：

visual question generation; causal inference; knowledge-guided;

D O I：

10.1145/3581783.3612536

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Generation (VQG) task aims to generate meaningful and logically reasonable questions about the given image targeting an answer. Existing methods mainly focus on the visual concepts present in the image for question generation and have shown remarkable performance in VQG. However, these models frequently learn highly co-occurring object relationships and attributes, which is an inherent bias in question generation. This previously overlooked bias causes models to over-exploit the spurious correlations among visual features, the target answer, and the question. Therefore, they may generate inappropriate questions that contradict the visual content or facts. In this paper, we first introduce a causal perspective on VQG and adopt the causal graph to analyze spurious correlations among variables. Building on the analysis, we propose a Knowledge Enhanced Causal Visual Question Generation (KECVQG) model to mitigate the impact of spurious correlations in question generation. Specifically, an interventional visual feature extractor (IVE) is introduced in KECVQG, which aims to obtain unbiased visual features by disentangling. Then a knowledge-guided representation extractor (KRE) is employed to align unbiased features with external knowledge. Finally, the output features from KRE are sent into a standard transformer decoder to generate questions. Extensive experiments on the VQA v2.0 and OKVQA datasets show that KECVQG significantly outperforms existing models.

引用

页码：5132 / 5142

页数：11

共 50 条

[1] Causal Inference with Selectively Deconfounded Data
Gan, Kyra
Li, Andrew A.
Lipton, Zachary C.
Tayur, Sridhar
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[2] Towards Deconfounded Image-Text Matching with Causal Inference
Li, Wenhui
Su, Xinqi
Song, Dan
Wang, Lanjun
Zhang, Kun
Liu, An-An
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6264 - 6273
[3] Variational Causal Inference Network for Explanatory Visual Question Answering
Xue, Dizhan
Qian, Shengsheng
Xu, Changsheng
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2515 - 2525
[4] Deconfounded Visual Grounding
Huang, Jianqiang
Qin, Yu
Qi, Jiaxin
Sun, Qianru
Zhang, Hanwang
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 998 - 1006
[5] Deconfounded Image Captioning: A Causal Retrospect
Yang, Xu
Zhang, Hanwang
Cai, Jianfei
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12996 - 13010
[6] Control variables and causal inference: a question of balance
York, Richard
[J]. INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2018, 21 (06) : 675 - 684
[7] Deconfounded recommendation via causal intervention
Yu, Dianer
Li, Qian
Wang, Xiangmeng
Xu, Guandong
[J]. NEUROCOMPUTING, 2023, 529 : 128 - 139
[8] CAUSAL CONNECTIVES INCREASE INFERENCE GENERATION
MILLIS, KK
GOLDING, JM
BARKER, G
[J]. DISCOURSE PROCESSES, 1995, 20 (01) : 29 - 49
[9] Deconfounded Video Moment Retrieval with Causal Intervention
Yang, Xun
Feng, Fuli
Ji, Wei
Wang, Meng
Chua, Tat-Seng
[J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1 - 10
[10] Visual Question Generation as Dual Task of Visual Question Answering
Li, Yikang
Duan, Nan
Zhou, Bolei
Chu, Xiao
Ouyang, Wanli
Wang, Xiaogang
Zhou, Ming
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124

← 1 2 3 4 5 →