Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering

被引:0
|
作者
Xue D. [1 ]
Qian S. [1 ]
Xu C. [1 ]
机构
[1] State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing
关键词
Causal inference; Cognition; Correlation; explainable artificial intelligence; explanatory visual question answering; Feature extraction; neural-symbolic reasoning; Predictive models; Question answering (information retrieval); Transformers; variational inference; vision-and-language; Visualization;
D O I
10.1109/TPAMI.2024.3398012
中图分类号
学科分类号
摘要
Recently, a novel multimodal reasoning task named Explanatory Visual Question Answering (EVQA) has been introduced, which combines answering visual questions with multimodal explanation generation to expound upon the underlying reasoning processes. In contrast to conventional Visual Question Answering (VQA) that merely concentrates on providing answers, EVQA aims to improve the explainability and verifiability of reasoning by providing user-friendly explanations. Despite the improved explainability of inferred results, the existing EVQA models still adopt black-box neural networks to infer results, lacking the explainability of the reasoning process. Moreover, existing EVQA models commonly predict answers and explanations in isolation, overlooking the inherent causal correlation between them. To handle these challenges, we propose a Program-guided Variational Causal Inference Network (Pro-VCIN) that integrates neural-symbolic reasoning with variational causal inference and constructs causal correlations between the predicted answers and explanations. First, we utilize pretrained models to extract visual features and convert questions into the corresponding programs. Secondly, we propose a multimodal program Transformer to translate programs and the related visual features into coherent and rational explanations of the reasoning processes Finally, we propose a variational causal inference to construct the target structural causal model and predict answers based on the causal correlation to explanations. Comprehensive experiments conducted on EVQA benchmark datasets reveal the superiority of Pro-VCIN in terms of both performance and explainability over state-of-the-art EVQA methods. IEEE
引用
收藏
页码:1 / 16
页数:15
相关论文
共 40 条
  • [1] Variational Causal Inference Network for Explanatory Visual Question Answering
    Xue, Dizhan
    Qian, Shengsheng
    Xu, Changsheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2515 - 2525
  • [2] Probabilistic Neural-symbolic Models for Interpretable Visual Question Answering
    Vedantam, Ramakrishna
    Desai, Karan
    Lee, Stefan
    Rohrbach, Marcus
    Batra, Dhruv
    Parikh, Devi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [3] Confidence-based interactable neural-symbolic visual question answering
    Bao, Yajie
    Xing, Tianwei
    Chen, Xun
    NEUROCOMPUTING, 2024, 564
  • [4] A Symbolic-Neural Reasoning Model for Visual Question Answering
    Gao, Jingying
    Blair, Alan
    Pagnucco, Maurice
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Closed Loop Neural-Symbolic Learning via Integrating Neural Perception, Grammar Parsing, and Symbolic Reasoning
    Li, Qing
    Huang, Siyuan
    Hong, Yining
    Chen, Yixin
    Wu, Ying Nian
    Zhu, Song-Chun
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] Neural-symbolic system for multimodal visual reasoning towards digital twin
    Zheng H.
    Liu T.
    Zheng H.
    Zuo D.
    Bao J.
    Wang S.
    Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS, 2024, 30 (05): : 1571 - 1586
  • [7] Visual Question Answering reasoning with external knowledge based on bimodal graph neural network
    Yang, Zhenyu
    Wu, Lei
    Wen, Peian
    Chen, Peng
    ELECTRONIC RESEARCH ARCHIVE, 2023, 31 (04): : 1948 - 1965
  • [8] A Probabilistic Graphical Model Based on Neural-symbolic Reasoning for Visual Relationship Detection
    Yu, Dongran
    Yang, Bo
    Wei, Qianhao
    Li, Anchen
    Pan, Shirui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10599 - 10608
  • [9] Semantic Relation Graph Reasoning Network for Visual Question Answering
    Lan, Hong
    Zhang, Pufen
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [10] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948