CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

被引:0
|
作者
Kotturl, Satwik [1 ]
Moural, Jose M. F. [1 ]
Parikh, Devi [2 ,3 ]
Batra, Dhruv [2 ,3 ]
Rohrbach, Marcus [2 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Facebook Al Res, Menlo Pk, CA USA
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Dialog is a multimodal task of answering a sequence of questions grounded in an image (using the conversation history as context). It entails challenges in vision, language, reasoning, and grounding. However, studying these subtasks in isolation on large, real datasets is infeasible as it requires prohibitively-expensive complete annotation of the 'state' of all images and dialogs. We develop CLEVR-Dialog, a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4:25M question-answer pairs. We use CLEVR-Dialog to benchmark performance of standard visual dialog models; in particular, on visual coreference resolution (as a function of the coreference distance). This is the first analysis of its kind for visual dialog models that was not possible without this dataset. We hope the findings from CLEVR-Dialog will help inform the development of future models for visual dialog. Our code and dataset are publicly available(1).
引用
收藏
页码:582 / 595
页数:14
相关论文
共 28 条
  • [1] Multi-round Dialogue State Tracking by Object-Entity Alignment in Visual Dialog
    Pang, Wei
    [J]. ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 541 - 553
  • [2] CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
    Johnson, Justin
    Hariharan, Bharath
    van der Maaten, Laurens
    Fei-Fei, Li
    Zitnick, C. Lawrence
    Girshick, Ross
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1988 - 1997
  • [3] Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
    Zhang, Shunyu
    Jiang, Xiaoze
    Yang, Zequn
    Wan, Tao
    Qin, Zengchang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4599 - 4608
  • [4] Multi-Granularity Semantic Collaborative Reasoning Network for Visual Dialog
    Zhang, Hongwei
    Wang, Xiaojie
    Jiang, Si
    Li, Xuefeng
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [5] CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
    Salewski, Leonard
    Koepke, A. Sophia
    Lensch, Hendrik P. A.
    Akata, Zeynep
    [J]. XXAI - BEYOND EXPLAINABLE AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, 2022, 13200 : 69 - 88
  • [6] Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
    Gan, Zhe
    Cheng, Yu
    El Kholy, Ahmed
    Li, Linjie
    Liu, Jingjing
    Gao, Jianfeng
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6463 - 6474
  • [7] New Datasets and Models for Contextual Reasoning in Visual Dialog
    Zhang, Yifeng
    Jiang, Ming
    Zhao, Qi
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 434 - 451
  • [8] Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog
    Du, Shanshan
    Wang, Hanli
    Li, Tengpeng
    Chen, Chang Wen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9095 - 9108
  • [9] Visual Question Answering on CLEVR Dataset via Multimodal Fusion and Relational Reasoning
    Allahyari, Abbas
    Borna, Keivan
    [J]. 2021 52ND ANNUAL IRANIAN MATHEMATICS CONFERENCE (AIMC), 2021, : 74 - 76
  • [10] DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog
    Chen, Feilong
    Meng, Fandong
    Xu, Jiaming
    Li, Peng
    Xu, Bo
    Zhou, Jie
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7504 - 7511