Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

被引:22
|
作者
Zhang, Zongmeng [1 ]
Han, Xianjing [1 ]
Song, Xuemeng [1 ]
Yan, Yan [2 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA
基金
中国国家自然科学基金;
关键词
Videos; Location awareness; Task analysis; Semantics; Syntactics; Convolution; Cognition; Temporal language localization; graph convolutional network; video and language; NEURAL-NETWORK;
D O I
10.1109/TIP.2021.3113791
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on tackling the problem of temporal language localization in videos, which aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video. However, it is non-trivial since it requires not only the comprehensive understanding of the video and sentence query, but also the accurate semantic correspondence capture between them. Existing efforts are mainly centered on exploring the sequential relation among video clips and query words to reason the video and sentence query, neglecting the other intra-modal relations (e.g., semantic similarity among video clips and syntactic dependency among the query words). Towards this end, in this work, we propose a Multi-modal Interaction Graph Convolutional Network (MIGCN), which jointly explores the complex intra-modal relations and inter-modal interactions residing in the video and sentence query to facilitate the understanding and semantic correspondence capture of the video and sentence query. In addition, we devise an adaptive context-aware localization method, where the context information is taken into the candidate moments and the multi-scale fully connected layers are designed to rank and adjust the boundary of the generated coarse candidate moments with different lengths. Extensive experiments on Charades-STA and ActivityNet datasets demonstrate the promising performance and superior efficiency of our model.
引用
收藏
页码:8265 / 8277
页数:13
相关论文
共 50 条
  • [31] Multi-Modal Convolutional Parameterisation Network for Guided Image Inverse Problems
    Czerkawski, Mikolaj
    Upadhyay, Priti
    Davison, Christopher
    Atkinson, Robert
    Michie, Craig
    Andonovic, Ivan
    Macdonald, Malcolm
    Cardona, Javier
    Tachtatzis, Christos
    JOURNAL OF IMAGING, 2024, 10 (03)
  • [32] MSTGC: Multi-Channel Spatio-Temporal Graph Convolution Network for Multi-Modal Brain Networks Fusion
    Xu, Ruting
    Zhu, Qi
    Li, Shengrong
    Hou, Zhenghua
    Shao, Wei
    Zhang, Daoqiang
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 2359 - 2369
  • [33] MMGCN: Multi-modal multi-view graph convolutional networks for cancer prognosis prediction
    Yang, Ping
    Chen, Wengxiang
    Qiu, Hang
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 257
  • [34] Multi-level Interaction Network for Multi-Modal Rumor Detection
    Zou, Ting
    Qian, Zhong
    Li, Peifeng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] On Graph Calculi for Multi-modal Logics
    Veloso, Paulo A. S.
    Veloso, Sheila R. M.
    Benevides, Mario R. F.
    ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2015, 312 : 231 - 252
  • [36] Heterogeneous multi-modal graph network for arterial travel time prediction
    Jie Fang
    Hangyu He
    Mengyun Xu
    Xiongwei Wu
    Applied Intelligence, 2025, 55 (6)
  • [37] Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos
    Junbin Zhang
    Pei-Hsuan Tsai
    Meng-Hsun Tsai
    Applied Intelligence, 2024, 54 : 2084 - 2099
  • [38] An enhanced multi-modal brain graph network for classifying neuropsychiatric disorders
    Liu, Liangliang
    Wang, Yu-Ping
    Wang, Yi
    Zhang, Pei
    Xiong, Shufeng
    MEDICAL IMAGE ANALYSIS, 2022, 81
  • [39] Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering
    Wang, Yu
    Yao, Xinjie
    Zhu, Pengfei
    Li, Weihao
    Cao, Meng
    Hu, Qinghua
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3847 - 3866
  • [40] Heterogeneous-Grained Multi-Modal Graph Network for Outfit Recommendation
    Xu, Rucong
    Wang, Jianfeng
    Li, Yun
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1788 - 1799