CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

被引:8
|
作者
Salewski, Leonard [1 ]
Koepke, A. Sophia [1 ]
Lensch, Hendrik P. A. [1 ]
Akata, Zeynep [1 ,2 ,3 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] MPI Informat, Saarbrucken, Germany
[3] MPI Intelligent Syst, Tubingen, Germany
关键词
Visual question answering; Natural language explanations;
D O I
10.1007/978-3-031-04083-2_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at https://github.com/ExplainableML/CLEVR-X.
引用
收藏
页码:69 / 88
页数:20
相关论文
共 50 条
  • [11] Zero-Shot Classification by Logical Reasoning on Natural Language Explanations
    Han, Chi
    Pei, Hengzhi
    Du, Xinya
    Ji, Heng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8967 - 8981
  • [12] Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
    Li, Zhuowan
    Wang, Xingrui
    Stengel-Eskin, Elias
    Kortylewski, Adam
    Ma, Wufei
    Van Durme, Benjamin
    Yuille, Alan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14963 - 14973
  • [13] GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
    Chen, Xianyu
    Jiang, Ming
    Zhao, Qi
    COMPUTER VISION - ECCV 2024, PT VIII, 2025, 15066 : 314 - 333
  • [14] e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
    Kayser, Maxime
    Camburu, Oana-Maria
    Salewski, Leonard
    Emde, Cornelius
    Do, Virginie
    Akata, Zeynep
    Lukasiewicz, Thomas
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1224 - 1234
  • [15] lilGym: Natural Language Visual Reasoning with Reinforcement Learning
    Wu, Anne
    Brantley, Kiante
    Kojima, Noriyuki
    Artzi, Yoav
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 9214 - 9234
  • [16] Shortcut Learning Explanations for Deep Natural Language Processing: A Survey on Dataset Biases
    Dogra, Varun
    Verma, Sahil
    Kavita
    Wozniak, Marcin
    Shafi, Jana
    Ijaz, Muhammad Fazal
    IEEE ACCESS, 2024, 12 : 26183 - 26195
  • [17] Recombination Samples Training for Robust Natural Language Visual Reasoning
    Jiang, Yuling
    Zhao, Yingyuan
    Bao, Bing-kun
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 564 - 569
  • [18] LogiQA 2.0-An Improved Dataset for Logical Reasoning in Natural Language Understanding
    Liu, Hanmeng
    Liu, Jian
    Cui, Leyang
    Teng, Zhiyang
    Duan, Nan
    Zhou, Ming
    Zhang, Yue
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2947 - 2962
  • [19] TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
    Chen, Howard
    Suhr, Alane
    Misra, Dipendra
    Snavely, Noah
    Artzi, Yoav
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12530 - 12539
  • [20] RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
    Zhang, Chi
    Gao, Feng
    Jia, Baoxiong
    Zhu, Yixin
    Zhu, Song-Chun
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5312 - 5322