CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

被引:8
|
作者
Salewski, Leonard [1 ]
Koepke, A. Sophia [1 ]
Lensch, Hendrik P. A. [1 ]
Akata, Zeynep [1 ,2 ,3 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] MPI Informat, Saarbrucken, Germany
[3] MPI Intelligent Syst, Tubingen, Germany
关键词
Visual question answering; Natural language explanations;
D O I
10.1007/978-3-031-04083-2_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at https://github.com/ExplainableML/CLEVR-X.
引用
收藏
页码:69 / 88
页数:20
相关论文
共 50 条
  • [1] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
    Johnson, Justin
    Hariharan, Bharath
    van der Maaten, Laurens
    Fei-Fei, Li
    Zitnick, C. Lawrence
    Girshick, Ross
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1988 - 1997
  • [2] CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning
    Lindstrom, Adam Dahlgren
    Abraham, Savitha Sam
    NEURAL-SYMBOLIC LEARNING AND REASONING, NESY 2022, 2022, : 155 - 170
  • [3] Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning
    Allahyari, Abbas
    Borna, Keivan
    2021 52ND ANNUAL IRANIAN MATHEMATICS CONFERENCE (AIMC), 2021, : 74 - 76
  • [4] CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
    Kotturl, Satwik
    Moural, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    Rohrbach, Marcus
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 582 - 595
  • [5] A Corpus of Natural Language for Visual Reasoning
    Suhr, Alane
    Lewis, Mike
    Yeh, James
    Artzi, Yoav
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 217 - 223
  • [6] BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information
    Kazemi, Mehran
    Yuan, Quan
    Bhatia, Deepti
    Kim, Najoung
    Xu, Xin
    Imbrasaite, Vaiva
    Ramachandran, Deepak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] CLEVR-Implicit: A Diagnostic Dataset for Implicit Reasoning in Referring Expression Comprehension
    Zhang, Jingwei
    Wu, Xin
    Cai, Yi
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12820 - 12830
  • [8] CLEVR-XAI: A benchmark dataset for the ground truth evaluation of neural network explanations
    Arras, Leila
    Osman, Ahmed
    Samek, Wojciech
    INFORMATION FUSION, 2022, 81 : 14 - 40
  • [9] CLEVR-Ref plus : Diagnosing Visual Reasoning with Referring Expressions
    Liu, Runtao
    Liu, Chenxi
    Bai, Yutong
    Yuille, Alan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4180 - 4189
  • [10] Visual Explanations of Probabilistic Reasoning
    Erwig, Martin
    Walkingshaw, Eric
    2009 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, PROCEEDINGS, 2009, : 23 - 27