Semantic Collaborative Learning for Cross-Modal Moment Localization

被引：0

作者：

Hu, Yupeng ^{[1
]}

Wang, Kun ^{[1
]}

Liu, Meng ^{[2
]}

Tang, Haoyu ^{[1
]}

Nie, Liqiang ^{[3
]}

机构：

[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China

[2] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Shandong, Peoples R China

[3] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China

来源：

ACM TRANSACTIONS ON INFORMATION SYSTEMS | 2024年 / 42卷 / 02期

关键词：

Cross-modal moment localization; intra-modal semantic understanding; inter-modal semantic collaboration; VIDEO; NETWORK; TEXT;

D O I：

10.1145/3620669

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Localizing a desired moment within an untrimmed video via a given natural language query, i.e., cross-modal moment localization, has attracted widespread research attention recently. However, it is a challenging task because it requires not only accurately understanding intra-modal semantic information, but also explicitly capturing inter-modal semantic correlations (consistency and complementarity). Existing efforts mainly focus on intra-modal semantic understanding and inter-modal semantic alignment, while ignoring necessary semantic supplement. Consequently, we present a cross-modal semantic perception network for more effective intra-modal semantic understanding and inter-modal semantic collaboration. Concretely, we design a dual-path representation network for intra-modal semantic modeling. Meanwhile, we develop a semantic collaborative network to achieve multi-granularity semantic alignment and hierarchical semantic supplement. Thereby, effective moment localization can be achieved based on sufficient semantic collaborative learning. Extensive comparison experiments demonstrate the promising performance of our model compared with existing state-of-the-art competitors.

引用

页数：26

共 50 条

[1] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
Hu, Yupeng
Nie, Liqiang
Liu, Meng
Wang, Kun
Wang, Yinglong
Hua, Xian-Sheng
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943
[2] Cross-modal Moment Localization in Videos
Liu, Meng
Wang, Xiang
Nie, Liqiang
Tian, Qi
Chen, Baoquan
Chua, Tat-Seng
[J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 843 - 851
[3] STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization
Cao, Da
Zeng, Yawen
Liu, Meng
He, Xiangnan
Wang, Meng
Qin, Zheng
[J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4162 - 4170
[4] Video Moment Localization via Deep Cross-Modal Hashing
Hu, Yupeng
Liu, Meng
Su, Xiaobin
Gao, Zan
Nie, Liqiang
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4667 - 4677
[5] Discriminative semantic transitive consistency for cross-modal learning
Parida, Kranti Kumar
Sharma, Gaurav
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 219
[6] Cross-modal semantic priming
Tabossi, P
[J]. LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
[7] Cross-Modal Semantic Communications
Li, Ang
Wei, Xin
Wu, Dan
Zhou, Liang
[J]. IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
[8] Cross-Modal Collaborative Communications
Zhou, Liang
Wu, Dan
Chen, Jianxin
Wei, Xin
[J]. IEEE WIRELESS COMMUNICATIONS, 2020, 27 (02) : 112 - 117
[9] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
Han, De
Cheng, Xing
Guo, Nan
Ye, Xiaochun
Rainer, Benjamin
Priller, Peter
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
[10] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
Xu, Xing
Song, Jingkuan
Lu, Huimin
Yang, Yang
Shen, Fumin
Huang, Zi
[J]. ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54

← 1 2 3 4 5 →