New Datasets and Models for Contextual Reasoning in Visual Dialog

被引:0
|
作者
Zhang, Yifeng [1 ]
Jiang, Ming [1 ]
Zhao, Qi [1 ]
机构
[1] Univ Minnesota, Minneapolis, MN 55455 USA
来源
基金
美国国家科学基金会;
关键词
D O I
10.1007/978-3-031-20059-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Dialog (VD) is a vision-language task that requires AI systems to maintain a natural question-answering dialog about visual contents. Using the dialog history as contexts, VD models have achieved promising performance on public benchmarks. However, prior VD datasets do not provide sufficient contextually dependent questions that require knowledge from the dialog history to answer. As a result, advanced VQA models can still perform well without considering the dialog context. In this work, we focus on developing new datasets and models to highlight the role of contextual reasoning in VD. We define a hierarchy of contextual patterns to represent and organize the dialog context, enabling quantitative analyses of contextual dependencies and designs of new VD datasts and models. We then develop two new datasets, namely CLEVR-VD and GQA-VD, offering context-rich dialogs over synthetic and realistic images, respectively. Furthermore, we propose a novel neural module network method featuring contextual reasoning in VD. We demonstrate the effectiveness of our proposed datasets and method with experimental results and model comparisons across different datasets. Our code and data are available at https://github.com/SuperJohnZhang/ContextVD.
引用
收藏
页码:434 / 451
页数:18
相关论文
共 50 条
  • [1] Efficient Dialog Policy Learning by Reasoning with Contextual Knowledge
    Zhang, Haodi
    Zeng, Zhichao
    Lu, Keting
    Wu, Kaishun
    Zhang, Shiqi
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11667 - 11675
  • [2] Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog
    Du, Shanshan
    Wang, Hanli
    Li, Tengpeng
    Chen, Chang Wen
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9095 - 9108
  • [3] Graph Models for Contextual Intention Prediction in Dialog Systems
    Kuznetsov, D. P.
    Ledneva, D. R.
    [J]. DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S399 - S415
  • [4] Graph Models for Contextual Intention Prediction in Dialog Systems
    D. P. Kuznetsov
    D. R. Ledneva
    [J]. Doklady Mathematics, 2023, 108 : S399 - S415
  • [5] CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
    Kotturl, Satwik
    Moural, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    Rohrbach, Marcus
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 582 - 595
  • [6] Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
    Zhang, Shunyu
    Jiang, Xiaoze
    Yang, Zequn
    Wan, Tao
    Qin, Zengchang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4599 - 4608
  • [7] Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
    Kang, Gi-Cheon
    Park, Junseok
    Lee, Hwaran
    Zhang, Byoung-Tak
    Kim, Jin-Hwa
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 327 - 339
  • [8] Multi-Granularity Semantic Collaborative Reasoning Network for Visual Dialog
    Zhang, Hongwei
    Wang, Xiaojie
    Jiang, Si
    Li, Xuefeng
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
  • [9] Visual models in scientific reasoning (Summary)
    Giere, RN
    [J]. PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1997, : 930 - 930
  • [10] Contextual reasoning
    Perrussel, L
    [J]. ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 366 - 367