New Datasets and Models for Contextual Reasoning in Visual Dialog

被引：0

作者：

Zhang, Yifeng ^{[1
]}

Jiang, Ming ^{[1
]}

Zhao, Qi ^{[1
]}

机构：

[1] Univ Minnesota, Minneapolis, MN 55455 USA

来源：

COMPUTER VISION, ECCV 2022, PT XXXVI | 2022年 / 13696卷

基金：

美国国家科学基金会;

关键词：

D O I：

10.1007/978-3-031-20059-5_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Dialog (VD) is a vision-language task that requires AI systems to maintain a natural question-answering dialog about visual contents. Using the dialog history as contexts, VD models have achieved promising performance on public benchmarks. However, prior VD datasets do not provide sufficient contextually dependent questions that require knowledge from the dialog history to answer. As a result, advanced VQA models can still perform well without considering the dialog context. In this work, we focus on developing new datasets and models to highlight the role of contextual reasoning in VD. We define a hierarchy of contextual patterns to represent and organize the dialog context, enabling quantitative analyses of contextual dependencies and designs of new VD datasts and models. We then develop two new datasets, namely CLEVR-VD and GQA-VD, offering context-rich dialogs over synthetic and realistic images, respectively. Furthermore, we propose a novel neural module network method featuring contextual reasoning in VD. We demonstrate the effectiveness of our proposed datasets and method with experimental results and model comparisons across different datasets. Our code and data are available at https://github.com/SuperJohnZhang/ContextVD.

引用

页码：434 / 451

页数：18

共 50 条

[1] Efficient Dialog Policy Learning by Reasoning with Contextual Knowledge
Zhang, Haodi
Zeng, Zhichao
Lu, Keting
Wu, Kaishun
Zhang, Shiqi
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11667 - 11675
[2] Hybrid Graph Reasoning With Dynamic Interaction for Visual Dialog
Du, Shanshan
Wang, Hanli
Li, Tengpeng
Chen, Chang Wen
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9095 - 9108
[3] Graph Models for Contextual Intention Prediction in Dialog Systems
Kuznetsov, D. P.
Ledneva, D. R.
[J]. DOKLADY MATHEMATICS, 2023, 108 (SUPPL 2) : S399 - S415
[4] Graph Models for Contextual Intention Prediction in Dialog Systems
D. P. Kuznetsov
D. R. Ledneva
[J]. Doklady Mathematics, 2023, 108 : S399 - S415
[5] CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Kotturl, Satwik
Moural, Jose M. F.
Parikh, Devi
Batra, Dhruv
Rohrbach, Marcus
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 582 - 595
[6] Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Zhang, Shunyu
Jiang, Xiaoze
Yang, Zequn
Wan, Tao
Qin, Zengchang
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4599 - 4608
[7] Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer
Kang, Gi-Cheon
Park, Junseok
Lee, Hwaran
Zhang, Byoung-Tak
Kim, Jin-Hwa
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 327 - 339
[8] Multi-Granularity Semantic Collaborative Reasoning Network for Visual Dialog
Zhang, Hongwei
Wang, Xiaojie
Jiang, Si
Li, Xuefeng
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (18):
[9] Visual models in scientific reasoning (Summary)
Giere, RN
[J]. PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1997, : 930 - 930
[10] Contextual reasoning
Perrussel, L
[J]. ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 366 - 367

← 1 2 3 4 5 →