Multimodal high-order relational network for vision-and-language tasks

被引:6
|
作者
Pan, Hao [1 ,2 ]
Huang, Jun [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Shanghai Adv Res Inst, Shanghai 201210, Peoples R China
基金
国家重点研发计划;
关键词
High-order relations; Vision-and-language tasks;
D O I
10.1016/j.neucom.2022.03.071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-language tasks require the understanding and learning of visual semantic relations, language syntactic relations and mutual relations between these two modalities. Existing methods only focus on intra-modality low-order relations by simply combining pairwise features while ignoring the intramodality high-order relations and the sophisticated correlations between visual and textual relations. We thus propose the multimodal high-order relational network (MORN) to simultaneously capture the intra-modality high-order relations and the sophisticated correlations between visual and textual relations. The MORN model consists of three modules. A coarse-to-fine visual relation encoder first captures the fully-connected relations between all visual objects, and then refines the local relations between neighbor objects. Moreover, a textual relation encoder is used to capture the syntactic relations between text words. Finally, a relational multimodal transformer is designed to align the multimodal representations and model sophisticated correlations between textual and visual relations. Our proposed approach shows state-of-the-art performance on two vision-and-language tasks, including visual question answering (VQA) and visual grounding (VG). (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:62 / 75
页数:14
相关论文
共 50 条
  • [41] Role of high-order aberrations in senescent changes in spatial vision
    Elliott, Sarah L.
    Choi, Stacey S.
    Doble, Nathan
    Hardy, Joseph L.
    Evans, Julia W.
    Werner, John S.
    JOURNAL OF VISION, 2009, 9 (02):
  • [42] Geometric multigrid for high-order regularizations of early vision problems
    Keeling, Stephen L.
    Haase, Gundolf
    APPLIED MATHEMATICS AND COMPUTATION, 2007, 184 (02) : 536 - 556
  • [43] HOP plus : History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation
    Qiao, Yanyuan
    Qi, Yuankai
    Hong, Yicong
    Yu, Zheng
    Wang, Peng
    Wu, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8524 - 8537
  • [44] Feature Correlation Hypergraph: Exploiting High-order Potentials for Multimodal Recognition
    Zhang, Luming
    Gao, Yue
    Hong, Chaoqun
    Feng, Yinfu
    Zhu, Jianke
    Cai, Deng
    IEEE TRANSACTIONS ON CYBERNETICS, 2014, 44 (08) : 1408 - 1419
  • [45] High-order Proximity Preserving Information Network Hashing
    Lian, Defu
    Zheng, Kai
    Zheng, Vincent W.
    Ge, Yong
    Cao, Longbing
    Tsang, Ivor W.
    Xie, Xing
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 1744 - 1753
  • [46] High-order Organization of Weighted Microbial Interaction Network
    Shen, Xianjun
    Gong, Xue
    Jiang, Xingpeng
    Yang, Jincai
    He, Tingting
    Hu, Xiaohua
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 206 - 209
  • [47] High-order Adams Network (HIAN) for image dehazing
    Yin, Shibai
    Hu, Shuhao
    Wang, Yibin
    Yang, Yee -Hong
    APPLIED SOFT COMPUTING, 2023, 139
  • [48] High-order Markov kernels for network intrusion detection
    Tian, Shengfeng
    Yin, Chuanhuan
    Mu, Shaomin
    NEURAL INFORMATION PROCESSING, PT 3, PROCEEDINGS, 2006, 4234 : 184 - 191
  • [49] High-order MS_CMAC neural network
    Jan, JC
    Hung, SL
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2001, 12 (03): : 598 - 603
  • [50] Pairwise and high-order dependencies in the cryptocurrency trading network
    Scagliarini, Tomas
    Pappalardo, Giuseppe
    Biondo, Alessio Emanuele
    Pluchino, Alessandro
    Rapisarda, Andrea
    Stramaglia, Sebastiano
    SCIENTIFIC REPORTS, 2022, 12 (01)