Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

被引：22

作者：

Zhang, Zongmeng ^{[1
]}

Han, Xianjing ^{[1
]}

Song, Xuemeng ^{[1
]}

Yan, Yan ^{[2
]}

Nie, Liqiang ^{[1
]}

机构：

[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China

[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Videos; Location awareness; Task analysis; Semantics; Syntactics; Convolution; Cognition; Temporal language localization; graph convolutional network; video and language; NEURAL-NETWORK;

D O I：

10.1109/TIP.2021.3113791

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on tackling the problem of temporal language localization in videos, which aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video. However, it is non-trivial since it requires not only the comprehensive understanding of the video and sentence query, but also the accurate semantic correspondence capture between them. Existing efforts are mainly centered on exploring the sequential relation among video clips and query words to reason the video and sentence query, neglecting the other intra-modal relations (e.g., semantic similarity among video clips and syntactic dependency among the query words). Towards this end, in this work, we propose a Multi-modal Interaction Graph Convolutional Network (MIGCN), which jointly explores the complex intra-modal relations and inter-modal interactions residing in the video and sentence query to facilitate the understanding and semantic correspondence capture of the video and sentence query. In addition, we devise an adaptive context-aware localization method, where the context information is taken into the candidate moments and the multi-scale fully connected layers are designed to rank and adjust the boundary of the generated coarse candidate moments with different lengths. Extensive experiments on Charades-STA and ActivityNet datasets demonstrate the promising performance and superior efficiency of our model.

引用

页码：8265 / 8277

页数：13

共 50 条

[31] Multi-Modal Convolutional Parameterisation Network for Guided Image Inverse Problems
Czerkawski, Mikolaj
Upadhyay, Priti
Davison, Christopher
Atkinson, Robert
Michie, Craig
Andonovic, Ivan
Macdonald, Malcolm
Cardona, Javier
Tachtatzis, Christos
JOURNAL OF IMAGING, 2024, 10 (03)
[32] MSTGC: Multi-Channel Spatio-Temporal Graph Convolution Network for Multi-Modal Brain Networks Fusion
Xu, Ruting
Zhu, Qi
Li, Shengrong
Hou, Zhenghua
Shao, Wei
Zhang, Daoqiang
IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2023, 31 : 2359 - 2369
[33] MMGCN: Multi-modal multi-view graph convolutional networks for cancer prognosis prediction
Yang, Ping
Chen, Wengxiang
Qiu, Hang
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 257
[34] Multi-level Interaction Network for Multi-Modal Rumor Detection
Zou, Ting
Qian, Zhong
Li, Peifeng
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[35] On Graph Calculi for Multi-modal Logics
Veloso, Paulo A. S.
Veloso, Sheila R. M.
Benevides, Mario R. F.
ELECTRONIC NOTES IN THEORETICAL COMPUTER SCIENCE, 2015, 312 : 231 - 252
[36] Heterogeneous multi-modal graph network for arterial travel time prediction
Jie Fang
Hangyu He
Mengyun Xu
Xiongwei Wu
Applied Intelligence, 2025, 55 (6)
[37] Semantic2Graph: graph-based multi-modal feature fusion for action segmentation in videos
Junbin Zhang
Pei-Hsuan Tsai
Meng-Hsun Tsai
Applied Intelligence, 2024, 54 : 2084 - 2099
[38] An enhanced multi-modal brain graph network for classifying neuropsychiatric disorders
Liu, Liangliang
Wang, Yu-Ping
Wang, Yi
Zhang, Pei
Xiong, Shufeng
MEDICAL IMAGE ANALYSIS, 2022, 81
[39] Integrated Heterogeneous Graph Attention Network for Incomplete Multi-modal Clustering
Wang, Yu
Yao, Xinjie
Zhu, Pengfei
Li, Weihao
Cao, Meng
Hu, Qinghua
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3847 - 3866
[40] Heterogeneous-Grained Multi-Modal Graph Network for Outfit Recommendation
Xu, Rucong
Wang, Jianfeng
Li, Yun
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1788 - 1799

← 1 2 3 4 5 →