Multi-aware coreference relation network for visual dialog

被引:0
|
作者
Zefan Zhang
Tianling Jiang
Chunping Liu
Yi Ji
机构
[1] Soochow University,School of Computer Science and Technology
关键词
Visual dialog; Multimedia; Coreference resolution; Cross-modal relationships;
D O I
暂无
中图分类号
学科分类号
摘要
As a challenging cross-media task, visual dialog assesses whether an AI agent can converse in human language based on its understanding of visual content. So the critical issue is to pay attention not only to the problem of coreference in vision, but also to the problem of coreference in and between vision and language. In this paper, we propose the multi-aware coreference relation network (MACR-Net) to solve it from both textual and visual perspectives and to do fusion in complementary awareness. Specifically, its textual coreference relation module identifies textual coreference relations based on multi-aware textual representation from textual view. Furthermore, the visual coreference relation module adaptively adjusts visual coreference relations based on contextual-aware relations representation from visual view. Finally, the multi-modals fusion module fuses multi-aware relations to get an aligned representation. Extensive experiments on the VisDial v1.0 benchmarks show that MACR-Net achieves state-of-the-art performance.
引用
收藏
页码:567 / 576
页数:9
相关论文
共 50 条
  • [41] Multi-view semantic understanding for visual dialog
    Jiang, Tianling
    Zhang, Zefan
    Li, Xin
    Ji, Yi
    Liu, Chunping
    KNOWLEDGE-BASED SYSTEMS, 2023, 268
  • [42] Recurrent Attention Network with Reinforced Generator for Visual Dialog
    Fan, Hehe
    Zhu, Linchao
    Yang, Yi
    Wu, Fei
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (03)
  • [43] Reciprocal question representation learning network for visual dialog
    Zhang, Hongwei
    Wang, Xiaojie
    Jiang, Si
    APPLIED INTELLIGENCE, 2023, 53 (05) : 4924 - 4939
  • [44] Heterogeneous Excitation-and-Squeeze Network for visual dialog
    Lin, Bingqian
    Zhu, Yi
    Liang, Xiaodan
    NEUROCOMPUTING, 2021, 449 : 399 - 410
  • [45] RAN: A Relation-aware Network for Relation Extraction
    Li, Yile
    Gu, Xiaoyan
    Yue, Yinliang
    Wang, Zhuo
    Li, Bo
    Wang, Weiping
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [46] CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog
    Kotturl, Satwik
    Moural, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    Rohrbach, Marcus
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 582 - 595
  • [47] Relation-Aware Alignment Attention Network for Multi-view Multi-label Learning
    Zhang, Yi
    Shen, Jundong
    Yu, Cheng
    Wang, Chongjun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 465 - 482
  • [48] DialogMCF: Multimodal Context Flow for Audio Visual Scene-Aware Dialog
    Chen, Zhe
    Liu, Hongcheng
    Wang, Yu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 753 - 764
  • [49] Relation-aware Graph Convolutional Networks for Multi-relational Network Alignment
    Fang, Yujie
    Li, Xin
    Ye, Rui
    Tan, Xiaoyan
    Zhao, Peiyao
    Wang, Mingzhong
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (02)
  • [50] Relation-Aware Multi-Pass Comparison Deconfounded Network for Change Captioning
    Lu, Zhicong
    Jin, Li
    Chen, Ziwei
    Tian, Changyuan
    Sun, Xian
    Li, Xiaoyu
    Zhang, Yi
    Li, Qi
    Xu, Guangluan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13349 - 13363