Multi-aware coreference relation network for visual dialog

被引：0

作者：

Zefan Zhang

Tianling Jiang

Chunping Liu

Yi Ji

机构：

[1] Soochow University,School of Computer Science and Technology

来源：

International Journal of Multimedia Information Retrieval | 2022年 / 11卷

关键词：

Visual dialog; Multimedia; Coreference resolution; Cross-modal relationships;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

As a challenging cross-media task, visual dialog assesses whether an AI agent can converse in human language based on its understanding of visual content. So the critical issue is to pay attention not only to the problem of coreference in vision, but also to the problem of coreference in and between vision and language. In this paper, we propose the multi-aware coreference relation network (MACR-Net) to solve it from both textual and visual perspectives and to do fusion in complementary awareness. Specifically, its textual coreference relation module identifies textual coreference relations based on multi-aware textual representation from textual view. Furthermore, the visual coreference relation module adaptively adjusts visual coreference relations based on contextual-aware relations representation from visual view. Finally, the multi-modals fusion module fuses multi-aware relations to get an aligned representation. Extensive experiments on the VisDial v1.0 benchmarks show that MACR-Net achieves state-of-the-art performance.

引用

页码：567 / 576

页数：9

共 50 条

[41] Multi-view semantic understanding for visual dialog
Jiang, Tianling
Zhang, Zefan
Li, Xin
Ji, Yi
Liu, Chunping
KNOWLEDGE-BASED SYSTEMS, 2023, 268
[42] Recurrent Attention Network with Reinforced Generator for Visual Dialog
Fan, Hehe
Zhu, Linchao
Yang, Yi
Wu, Fei
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
[43] Reciprocal question representation learning network for visual dialog
Zhang, Hongwei
Wang, Xiaojie
Jiang, Si
APPLIED INTELLIGENCE, 2023, 53 (05) : 4924 - 4939
[44] Heterogeneous Excitation-and-Squeeze Network for visual dialog
Lin, Bingqian
Zhu, Yi
Liang, Xiaodan
NEUROCOMPUTING, 2021, 449 : 399 - 410
[45] RAN: A Relation-aware Network for Relation Extraction
Li, Yile
Gu, Xiaoyan
Yue, Yinliang
Wang, Zhuo
Li, Bo
Wang, Weiping
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[46] CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
Kotturl, Satwik
Moural, Jose M. F.
Parikh, Devi
Batra, Dhruv
Rohrbach, Marcus
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 582 - 595
[47] Relation-Aware Alignment Attention Network for Multi-view Multi-label Learning
Zhang, Yi
Shen, Jundong
Yu, Cheng
Wang, Chongjun
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 465 - 482
[48] DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog
Chen, Zhe
Liu, Hongcheng
Wang, Yu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 753 - 764
[49] Relation-aware Graph Convolutional Networks for Multi-relational Network Alignment
Fang, Yujie
Li, Xin
Ye, Rui
Tan, Xiaoyan
Zhao, Peiyao
Wang, Mingzhong
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
[50] Relation-Aware Multi-Pass Comparison Deconfounded Network for Change Captioning
Lu, Zhicong
Jin, Li
Chen, Ziwei
Tian, Changyuan
Sun, Xian
Li, Xiaoyu
Zhang, Yi
Li, Qi
Xu, Guangluan
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13349 - 13363

← 1 2 3 4 5 →