Deconfounded Visual Question Generation with Causal Inference

被引:1
|
作者
Chen, Jiali [1 ]
Guo, Zhenjun [1 ]
Xie, Jiayuan [1 ]
Cai, Yi [1 ]
Li, Qing [2 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Hong Kong Polytech Univ, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
visual question generation; causal inference; knowledge-guided;
D O I
10.1145/3581783.3612536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Generation (VQG) task aims to generate meaningful and logically reasonable questions about the given image targeting an answer. Existing methods mainly focus on the visual concepts present in the image for question generation and have shown remarkable performance in VQG. However, these models frequently learn highly co-occurring object relationships and attributes, which is an inherent bias in question generation. This previously overlooked bias causes models to over-exploit the spurious correlations among visual features, the target answer, and the question. Therefore, they may generate inappropriate questions that contradict the visual content or facts. In this paper, we first introduce a causal perspective on VQG and adopt the causal graph to analyze spurious correlations among variables. Building on the analysis, we propose a Knowledge Enhanced Causal Visual Question Generation (KECVQG) model to mitigate the impact of spurious correlations in question generation. Specifically, an interventional visual feature extractor (IVE) is introduced in KECVQG, which aims to obtain unbiased visual features by disentangling. Then a knowledge-guided representation extractor (KRE) is employed to align unbiased features with external knowledge. Finally, the output features from KRE are sent into a standard transformer decoder to generate questions. Extensive experiments on the VQA v2.0 and OKVQA datasets show that KECVQG significantly outperforms existing models.
引用
收藏
页码:5132 / 5142
页数:11
相关论文
共 50 条
  • [1] Causal Inference with Selectively Deconfounded Data
    Gan, Kyra
    Li, Andrew A.
    Lipton, Zachary C.
    Tayur, Sridhar
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [2] Towards Deconfounded Image-Text Matching with Causal Inference
    Li, Wenhui
    Su, Xinqi
    Song, Dan
    Wang, Lanjun
    Zhang, Kun
    Liu, An-An
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6264 - 6273
  • [3] Variational Causal Inference Network for Explanatory Visual Question Answering
    Xue, Dizhan
    Qian, Shengsheng
    Xu, Changsheng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2515 - 2525
  • [4] Deconfounded Visual Grounding
    Huang, Jianqiang
    Qin, Yu
    Qi, Jiaxin
    Sun, Qianru
    Zhang, Hanwang
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 998 - 1006
  • [5] Deconfounded Image Captioning: A Causal Retrospect
    Yang, Xu
    Zhang, Hanwang
    Cai, Jianfei
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12996 - 13010
  • [7] Deconfounded recommendation via causal intervention
    Yu, Dianer
    Li, Qian
    Wang, Xiangmeng
    Xu, Guandong
    [J]. NEUROCOMPUTING, 2023, 529 : 128 - 139
  • [8] CAUSAL CONNECTIVES INCREASE INFERENCE GENERATION
    MILLIS, KK
    GOLDING, JM
    BARKER, G
    [J]. DISCOURSE PROCESSES, 1995, 20 (01) : 29 - 49
  • [9] Deconfounded Video Moment Retrieval with Causal Intervention
    Yang, Xun
    Feng, Fuli
    Ji, Wei
    Wang, Meng
    Chua, Tat-Seng
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1 - 10
  • [10] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124