CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

被引:856
|
作者
Johnson, Justin [1 ,2 ]
Hariharan, Bharath [2 ]
van der Maaten, Laurens [2 ]
Fei-Fei, Li [1 ]
Zitnick, C. Lawrence [2 ]
Girshick, Ross [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
10.1109/CVPR.2017.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
引用
收藏
页码:1988 / 1997
页数:10
相关论文
共 50 条
  • [1] CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
    Salewski, Leonard
    Koepke, A. Sophia
    Lensch, Hendrik P. A.
    Akata, Zeynep
    [J]. XXAI - BEYOND EXPLAINABLE AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, 2022, 13200 : 69 - 88
  • [2] CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
    Kotturl, Satwik
    Moural, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    Rohrbach, Marcus
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 582 - 595
  • [3] Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning
    Allahyari, Abbas
    Borna, Keivan
    [J]. 2021 52ND ANNUAL IRANIAN MATHEMATICS CONFERENCE (AIMC), 2021, : 74 - 76
  • [4] CLEVR-Ref plus : Diagnosing Visual Reasoning with Referring Expressions
    Liu, Runtao
    Liu, Chenxi
    Bai, Yutong
    Yuille, Alan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 4180 - 4189
  • [5] Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
    Li, Zhuowan
    Wang, Xingrui
    Stengel-Eskin, Elias
    Kortylewski, Adam
    Ma, Wufei
    Van Durme, Benjamin
    Yuille, Alan
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14963 - 14973
  • [6] A Benchmark for Compositional Visual Reasoning
    Zerroug, Aimen
    Vaishnav, Mohit
    Colin, Julien
    Musslick, Sebastian
    Serre, Thomas
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
    Gao, Difei
    Wang, Ruiping
    Shan, Shiguang
    Chen, Xilin
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 5561 - 5578
  • [8] Visual Programming: Compositional visual reasoning without training
    Gupta, Tanmay
    Kembhavi, Aniruddha
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14953 - 14962
  • [9] Meta Module Network for Compositional Visual Reasoning
    Chen, Wenhu
    Gan, Zhe
    Li, Linjie
    Cheng, Yu
    Wang, William
    Liu, Jingjing
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 655 - 664
  • [10] GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering visualreasoning.net
    Hudson, Drew A.
    Manning, Christopher D.
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6693 - 6702