Multi-aware coreference relation network for visual dialog

被引:0
|
作者
Zefan Zhang
Tianling Jiang
Chunping Liu
Yi Ji
机构
[1] Soochow University,School of Computer Science and Technology
关键词
Visual dialog; Multimedia; Coreference resolution; Cross-modal relationships;
D O I
暂无
中图分类号
学科分类号
摘要
As a challenging cross-media task, visual dialog assesses whether an AI agent can converse in human language based on its understanding of visual content. So the critical issue is to pay attention not only to the problem of coreference in vision, but also to the problem of coreference in and between vision and language. In this paper, we propose the multi-aware coreference relation network (MACR-Net) to solve it from both textual and visual perspectives and to do fusion in complementary awareness. Specifically, its textual coreference relation module identifies textual coreference relations based on multi-aware textual representation from textual view. Furthermore, the visual coreference relation module adaptively adjusts visual coreference relations based on contextual-aware relations representation from visual view. Finally, the multi-modals fusion module fuses multi-aware relations to get an aligned representation. Extensive experiments on the VisDial v1.0 benchmarks show that MACR-Net achieves state-of-the-art performance.
引用
收藏
页码:567 / 576
页数:9
相关论文
共 50 条
  • [31] Context-Aware Graph Inference With Knowledge Distillation for Visual Dialog
    Guo, Dan
    Wang, Hui
    Wang, Meng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6056 - 6073
  • [32] A Simple Baseline for Audio-Visual Scene-Aware Dialog
    Schwartz, Idan
    Schwing, Alexander
    Hazan, Tamir
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12540 - 12550
  • [33] Channel-Aware Decoupling Network for Multiturn Dialog Comprehension
    Zhang, Zhuosheng
    Zhao, Hai
    Liu, Longxiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (06) : 7685 - 7696
  • [34] MULTI-LEVEL RELATION AWARE NETWORK FOR PERSON RE-IDENTIFICATION
    Yang, Jing
    Zhang, Canlong
    Li, Zhixin
    Tang, Yanping
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2709 - 2713
  • [35] Energy Aware Multi-Object Detection Method in Visual Sensor Network
    Yousefi, Shamim
    Aghdasi, Hadi S.
    2015 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2015, : 48 - 53
  • [36] Relation-Aware Graph Attention Network for Multi-Behavior Recommendation
    Wu, Ming
    Ni, Qiufen
    Wu, Jigang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [37] COUPLING ATTENTION AND CONVOLUTION FOR HEURISTIC NETWORK IN VISUAL DIALOG
    Zhang, Zefan
    Jiang, Tianling
    Liu, Chunping
    Ji, Yi
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2896 - 2900
  • [38] Multi-Encoder Sequential Attention Network for Context-Aware Speech Recognition in Japanese Dialog Conversation
    Tachimori, Nobuya
    Sakti, Sakriani
    Nakamura, Satoshi
    2021 24th Conference of the Oriental COCOSDA International Committee for the Co-Ordination and Standardisation of Speech Databases and Assessment Techniques, O-COCOSDA 2021, 2021, : 1 - 6
  • [39] Reciprocal question representation learning network for visual dialog
    Hongwei Zhang
    Xiaojie Wang
    Si Jiang
    Applied Intelligence, 2023, 53 : 4924 - 4939
  • [40] MULTI-ENCODER SEQUENTIAL ATTENTION NETWORK FOR CONTEXT-AWARE SPEECH RECOGNITION IN JAPANESE DIALOG CONVERSATION
    Tachimori, Nobuya
    Sakti, Sakriani
    Nakamura, Satoshi
    2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2021, : 1 - 6