CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

被引:8
|
作者
Salewski, Leonard [1 ]
Koepke, A. Sophia [1 ]
Lensch, Hendrik P. A. [1 ]
Akata, Zeynep [1 ,2 ,3 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] MPI Informat, Saarbrucken, Germany
[3] MPI Intelligent Syst, Tubingen, Germany
关键词
Visual question answering; Natural language explanations;
D O I
10.1007/978-3-031-04083-2_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at https://github.com/ExplainableML/CLEVR-X.
引用
收藏
页码:69 / 88
页数:20
相关论文
共 50 条
  • [41] HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
    Wang, Zhecan
    Bingham, Garrett
    Yu, Adams Wei
    Lee, Quoc, V
    Luong, Thang
    Ghiasi, Golnaz
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 288 - 304
  • [42] A visual language for Web querying and reasoning
    Berger, S
    Bry, F
    Schaffert, S
    PRINCIPLES AND PRACTICE OF SEMANTIC WEB REASONING, 2003, 2901 : 99 - 112
  • [43] A visual language for explaining probabilistic reasoning
    Erwig, Martin
    Walkingshaw, Eric
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2013, 24 (02): : 88 - 109
  • [44] Quantifying Uncertainty in Natural Language Explanations of Large Language Models
    Tanneru, Sree Harsha
    Agarwal, Chirag
    Lakkaraju, Himabindu
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [45] ExpBERT: Representation Engineering with Natural Language Explanations
    Murty, Shikhar
    Koh, Pang Wei
    Liang, Percy
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2106 - 2113
  • [46] Multimedia reasoning with natural language support
    Dasiopoulou, Stamatia
    Heinecke, Johannes
    Saathoff, Carsten
    Strintzis, Michael G.
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 413 - +
  • [47] Temporal Reasoning in Natural Language Inference
    Vashishtha, Siddharth
    Poliak, Adam
    Lal, Yash Kumar
    Van Durme, Benjamin
    White, Aaron Steven
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4070 - 4078
  • [48] Commonsense reasoning in and over natural language
    Liu, H
    Singh, P
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 3, PROCEEDINGS, 2004, 3215 : 293 - 306
  • [49] The Topology and Language of Relationships in the Visual Genome Dataset
    Abou Chacra, David
    Zelek, John
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4859 - 4867
  • [50] Informal Reasoning and Formal Logic: Normativity of Natural Language Reasoning
    Smokrovic, Nenad
    CROATIAN JOURNAL OF PHILOSOPHY, 2018, 18 (54) : 455 - 469