Multimodal high-order relational network for vision-and-language tasks

被引:6
|
作者
Pan, Hao [1 ,2 ]
Huang, Jun [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
基金
国家重点研发计划;
关键词
High-order relations; Vision-and-language tasks;
D O I
10.1016/j.neucom.2022.03.071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-language tasks require the understanding and learning of visual semantic relations, language syntactic relations and mutual relations between these two modalities. Existing methods only focus on intra-modality low-order relations by simply combining pairwise features while ignoring the intramodality high-order relations and the sophisticated correlations between visual and textual relations. We thus propose the multimodal high-order relational network (MORN) to simultaneously capture the intra-modality high-order relations and the sophisticated correlations between visual and textual relations. The MORN model consists of three modules. A coarse-to-fine visual relation encoder first captures the fully-connected relations between all visual objects, and then refines the local relations between neighbor objects. Moreover, a textual relation encoder is used to capture the syntactic relations between text words. Finally, a relational multimodal transformer is designed to align the multimodal representations and model sophisticated correlations between textual and visual relations. Our proposed approach shows state-of-the-art performance on two vision-and-language tasks, including visual question answering (VQA) and visual grounding (VG). (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:62 / 75
页数:14
相关论文
共 50 条
  • [31] Multimodal High-order Relation Transformer for Scene Boundary Detection
    Wei, Xi
    Shi, Zhangxiang
    Zhang, Tianzhu
    Yu, Xiaoyuan
    Xiao, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22024 - 22033
  • [32] Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
    Hou, Ming
    Tang, Jiajia
    Zhang, Jianhai
    Kong, Wanzeng
    Zhao, Qibin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [33] High-order Correlation Network for Video Recognition
    Dong, Wei
    Wang, Zhenwei
    Zhang, Bingbing
    Zhang, Jianxin
    Zhang, Qiang
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [34] OPTOELECTRONIC HIGH-ORDER FEEDBACK NEURAL NETWORK
    SELVIAH, DR
    MAO, ZQ
    MIDWINTER, JE
    ELECTRONICS LETTERS, 1990, 26 (23) : 1954 - 1955
  • [35] Social contagion in high-order network with mutation
    Li, Tianyu
    Wu, Yong
    Ding, Qianming
    Xie, Ying
    Yu, Dong
    Yang, Lijian
    Jia, Ya
    CHAOS SOLITONS & FRACTALS, 2024, 180
  • [36] Relational metric learning with high-order neighborhood interactions for social recommendation
    Zhen Liu
    Xiaodong Wang
    Ying Ma
    Xinxin Yang
    Knowledge and Information Systems, 2022, 64 : 1525 - 1547
  • [37] MATHEMATICAL SCIENCE LIBRARY WRITTEN IN A HIGH-ORDER LANGUAGE
    BURLAKOFF, M
    SIAM REVIEW, 1978, 20 (03) : 620 - 620
  • [38] Compositional Dependencies and High-Order Relations Formulas at Relational Data Models
    Rodionov, A. N.
    AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2024, 58 (05) : 320 - 332
  • [39] Relational metric learning with high-order neighborhood interactions for social recommendation
    Liu, Zhen
    Wang, Xiaodong
    Ma, Ying
    Yang, Xinxin
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (06) : 1525 - 1547
  • [40] Scalable Deep Generative Relational Models with High-Order Node Dependence
    Fan, Xuhui
    Li, Bin
    Sisson, Scott A.
    Li, Caoyuan
    Chen, Ling
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32