Multi-aware coreference relation network for visual dialog

被引：0

作者：

Zefan Zhang

Tianling Jiang

Chunping Liu

Yi Ji

机构：

[1] Soochow University,School of Computer Science and Technology

来源：

International Journal of Multimedia Information Retrieval | 2022年 / 11卷

关键词：

Visual dialog; Multimedia; Coreference resolution; Cross-modal relationships;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

As a challenging cross-media task, visual dialog assesses whether an AI agent can converse in human language based on its understanding of visual content. So the critical issue is to pay attention not only to the problem of coreference in vision, but also to the problem of coreference in and between vision and language. In this paper, we propose the multi-aware coreference relation network (MACR-Net) to solve it from both textual and visual perspectives and to do fusion in complementary awareness. Specifically, its textual coreference relation module identifies textual coreference relations based on multi-aware textual representation from textual view. Furthermore, the visual coreference relation module adaptively adjusts visual coreference relations based on contextual-aware relations representation from visual view. Finally, the multi-modals fusion module fuses multi-aware relations to get an aligned representation. Extensive experiments on the VisDial v1.0 benchmarks show that MACR-Net achieves state-of-the-art performance.

引用

页码：567 / 576

页数：9

共 50 条

[31] Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog
Guo, Dan
Wang, Hui
Wang, Meng
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6056 - 6073
[32] A Simple Baseline for Audio-Visual Scene-Aware Dialog
Schwartz, Idan
Schwing, Alexander
Hazan, Tamir
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12540 - 12550
[33] Channel-Aware Decoupling Network for Multiturn Dialog Comprehension
Zhang, Zhuosheng
Zhao, Hai
Liu, Longxiang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7685 - 7696
[34] MULTI-LEVEL RELATION AWARE NETWORK FOR PERSON RE-IDENTIFICATION
Yang, Jing
Zhang, Canlong
Li, Zhixin
Tang, Yanping
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2709 - 2713
[35] Energy Aware Multi-Object Detection Method in Visual Sensor Network
Yousefi, Shamim
Aghdasi, Hadi S.
2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2015, : 48 - 53
[36] Relation-Aware Graph Attention Network for Multi-Behavior Recommendation
Wu, Ming
Ni, Qiufen
Wu, Jigang
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[37] COUPLING ATTENTION AND CONVOLUTION FOR HEURISTIC NETWORK IN VISUAL DIALOG
Zhang, Zefan
Jiang, Tianling
Liu, Chunping
Ji, Yi
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2896 - 2900
[38] Multi-Encoder Sequential Attention Network for Context-Aware Speech Recognition in Japanese Dialog Conversation
Tachimori, Nobuya
Sakti, Sakriani
Nakamura, Satoshi
2021 24th Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2021, 2021, : 1 - 6
[39] Reciprocal question representation learning network for visual dialog
Hongwei Zhang
Xiaojie Wang
Si Jiang
Applied Intelligence, 2023, 53 : 4924 - 4939
[40] MULTI-ENCODER SEQUENTIAL ATTENTION NETWORK FOR CONTEXT-AWARE SPEECH RECOGNITION IN JAPANESE DIALOG CONVERSATION
Tachimori, Nobuya
Sakti, Sakriani
Nakamura, Satoshi
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 1 - 6

← 1 2 3 4 5 →