Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

被引：22

作者：

Zhang, Zongmeng ^{[1
]}

Han, Xianjing ^{[1
]}

Song, Xuemeng ^{[1
]}

Yan, Yan ^{[2
]}

Nie, Liqiang ^{[1
]}

机构：

[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China

[2] IIT, Dept Comp Sci, Chicago, IL 60616 USA

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2021年 / 30卷

基金：

中国国家自然科学基金;

关键词：

Videos; Location awareness; Task analysis; Semantics; Syntactics; Convolution; Cognition; Temporal language localization; graph convolutional network; video and language; NEURAL-NETWORK;

D O I：

10.1109/TIP.2021.3113791

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on tackling the problem of temporal language localization in videos, which aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video. However, it is non-trivial since it requires not only the comprehensive understanding of the video and sentence query, but also the accurate semantic correspondence capture between them. Existing efforts are mainly centered on exploring the sequential relation among video clips and query words to reason the video and sentence query, neglecting the other intra-modal relations (e.g., semantic similarity among video clips and syntactic dependency among the query words). Towards this end, in this work, we propose a Multi-modal Interaction Graph Convolutional Network (MIGCN), which jointly explores the complex intra-modal relations and inter-modal interactions residing in the video and sentence query to facilitate the understanding and semantic correspondence capture of the video and sentence query. In addition, we devise an adaptive context-aware localization method, where the context information is taken into the candidate moments and the multi-scale fully connected layers are designed to rank and adjust the boundary of the generated coarse candidate moments with different lengths. Extensive experiments on Charades-STA and ActivityNet datasets demonstrate the promising performance and superior efficiency of our model.

引用

页码：8265 / 8277

页数：13

共 50 条

[1] Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos
Zatsarynna, Olga
Abu Farha, Yazan
Gall, Juergen
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2249 - 2258
[2] Multi-modal Graph Convolutional Network for Knowledge Graph Entity Alignment
You, Yinghui
Wei, Yuyang
Zhang, Yanlong
Chen, Wei
Zhao, Lei
WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 142 - 157
[3] Sparse graph matching network for temporal language localization in videos
Wu, Guangli
Xu, Tongjie
Zhang, Jing
COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 240
[4] Multi-Modal Sarcasm Detection via Cross-Modal Graph Convolutional Network
Liang, Bin
Lou, Chenwei
Li, Xiang
Yang, Min
Gui, Lin
He, Yulan
Pei, Wenjie
Xu, Ruifeng
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1767 - 1777
[5] Graph Convolutional Incomplete Multi-modal Hashing
Shen, Xiaobo
Chen, Yinfan
Pan, Shirui
Liu, Weiwei
Zheng, Yuhui
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 7029 - 7037
[6] Graph Convolutional Module for Temporal Action Localization in Videos
Zeng, Runhao
Huang, Wenbing
Tan, Mingkui
Rong, Yu
Zhao, Peilin
Huang, Junzhou
Gan, Chuang
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 6209 - 6223
[7] Multi-Modal Multi-Instance Multi-Label Learning with Graph Convolutional Network
Hang, Cheng
Wang, Wei
Zhan, De-Chuan
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] MKGCN: Multi-Modal Knowledge Graph Convolutional Network for Music Recommender Systems
Cui, Xiaohui
Qu, Xiaolong
Li, Dongmei
Yang, Yu
Li, Yuxun
Zhang, Xiaoping
ELECTRONICS, 2023, 12 (12)
[9] Multi-Modal Graph Interaction for Multi-Graph Convolution Network in Urban Spatiotemporal Forecasting
Zhang, Lingyu
Geng, Xu
Qin, Zhiwei
Wang, Hongjun
Wang, Xiao
Zhang, Ying
Liang, Jian
Wu, Guobin
Song, Xuan
Wang, Yunhai
SUSTAINABILITY, 2022, 14 (19)
[10] Ensemble Manifold Regularized Multi-Modal Graph Convolutional Network for Cognitive Ability Prediction
Qu, Gang
Xiao, Li
Hu, Wenxing
Wang, Junqi
Zhang, Kun
Calhoun, Vince D.
Wang, Yu-Ping
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2021, 68 (12) : 3564 - 3573

← 1 2 3 4 5 →