Multi-aware coreference relation network for visual dialog

被引:0
|
作者
Zefan Zhang
Tianling Jiang
Chunping Liu
Yi Ji
机构
[1] Soochow University,School of Computer Science and Technology
关键词
Visual dialog; Multimedia; Coreference resolution; Cross-modal relationships;
D O I
暂无
中图分类号
学科分类号
摘要
As a challenging cross-media task, visual dialog assesses whether an AI agent can converse in human language based on its understanding of visual content. So the critical issue is to pay attention not only to the problem of coreference in vision, but also to the problem of coreference in and between vision and language. In this paper, we propose the multi-aware coreference relation network (MACR-Net) to solve it from both textual and visual perspectives and to do fusion in complementary awareness. Specifically, its textual coreference relation module identifies textual coreference relations based on multi-aware textual representation from textual view. Furthermore, the visual coreference relation module adaptively adjusts visual coreference relations based on contextual-aware relations representation from visual view. Finally, the multi-modals fusion module fuses multi-aware relations to get an aligned representation. Extensive experiments on the VisDial v1.0 benchmarks show that MACR-Net achieves state-of-the-art performance.
引用
收藏
页码:567 / 576
页数:9
相关论文
共 50 条
  • [1] Multi-aware coreference relation network for visual dialog
    Zhang, Zefan
    Jiang, Tianling
    Liu, Chunping
    Ji, Yi
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (04) : 567 - 576
  • [2] A multi-aware graph convolutional network for driver drowsiness detection
    Lin, Liang
    Wang, Song
    Yang, Jucheng
    Wei, Feng
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [3] Modeling Coreference Relations in Visual Dialog
    Li, Mingxiao
    Moens, Marie-Francine
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 3306 - 3318
  • [4] GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
    Chen, Feilong
    Chen, Xiuyi
    Meng, Fandong
    Li, Peng
    Zhou, Jie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 230 - 243
  • [5] Knowledge-Aware Causal Inference Network for Visual Dialog
    Zhang, Zefan
    Liu, Chunping
    Ji, Yi
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 253 - 261
  • [6] Visual Coreference Resolution in Visual Dialog Using Neural Module Networks
    Kottur, Satwik
    Moura, Jose M. F.
    Parikh, Devi
    Batra, Dhruv
    Rohrbach, Marcus
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 160 - 178
  • [7] Textual-Visual Reference-Aware Attention Network for Visual Dialog
    Guo, Dan
    Wang, Hui
    Wang, Shuhui
    Wang, Meng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6655 - 6666
  • [8] Textual-Visual Reference-Aware Attention Network for Visual Dialog
    School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
    不详
    IEEE Trans Image Process, 2020, (6655-6666):
  • [9] Multi-View Attention Network for Visual Dialog
    Park, Sungjin
    Whang, Taesun
    Yoon, Yeochan
    Lim, Heuiseok
    APPLIED SCIENCES-BASEL, 2021, 11 (07):
  • [10] VD-PCR: Improving visual dialog with pronoun coreference resolution
    Yu, Xintong
    Zhang, Hongming
    Hong, Ruixin
    Song, Yangqiu
    Zhang, Changshui
    PATTERN RECOGNITION, 2022, 125