Semantic Collaborative Learning for Cross-Modal Moment Localization

被引:0
|
作者
Hu, Yupeng [1 ]
Wang, Kun [1 ]
Liu, Meng [2 ]
Tang, Haoyu [1 ]
Nie, Liqiang [3 ]
机构
[1] Shandong Univ, Sch Software, Jinan 250101, Shandong, Peoples R China
[2] Shandong Jianzhu Univ, Sch Comp Sci & Technol, Jinan 250101, Shandong, Peoples R China
[3] Harbin Inst Technol Shenzhen, Sch Comp Sci & Technol, Shenzhen 518055, Peoples R China
关键词
Cross-modal moment localization; intra-modal semantic understanding; inter-modal semantic collaboration; VIDEO; NETWORK; TEXT;
D O I
10.1145/3620669
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Localizing a desired moment within an untrimmed video via a given natural language query, i.e., cross-modal moment localization, has attracted widespread research attention recently. However, it is a challenging task because it requires not only accurately understanding intra-modal semantic information, but also explicitly capturing inter-modal semantic correlations (consistency and complementarity). Existing efforts mainly focus on intra-modal semantic understanding and inter-modal semantic alignment, while ignoring necessary semantic supplement. Consequently, we present a cross-modal semantic perception network for more effective intra-modal semantic understanding and inter-modal semantic collaboration. Concretely, we design a dual-path representation network for intra-modal semantic modeling. Meanwhile, we develop a semantic collaborative network to achieve multi-granularity semantic alignment and hierarchical semantic supplement. Thereby, effective moment localization can be achieved based on sufficient semantic collaborative learning. Extensive comparison experiments demonstrate the promising performance of our model compared with existing state-of-the-art competitors.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Coarse-to-Fine Semantic Alignment for Cross-Modal Moment Localization
    Hu, Yupeng
    Nie, Liqiang
    Liu, Meng
    Wang, Kun
    Wang, Yinglong
    Hua, Xian-Sheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 5933 - 5943
  • [2] Cross-modal Moment Localization in Videos
    Liu, Meng
    Wang, Xiang
    Nie, Liqiang
    Tian, Qi
    Chen, Baoquan
    Chua, Tat-Seng
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 843 - 851
  • [3] STRONG: Spatio-Temporal Reinforcement Learning for Cross-Modal Video Moment Localization
    Cao, Da
    Zeng, Yawen
    Liu, Meng
    He, Xiangnan
    Wang, Meng
    Qin, Zheng
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4162 - 4170
  • [4] Video Moment Localization via Deep Cross-Modal Hashing
    Hu, Yupeng
    Liu, Meng
    Su, Xiaobin
    Gao, Zan
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 4667 - 4677
  • [5] Discriminative semantic transitive consistency for cross-modal learning
    Parida, Kranti Kumar
    Sharma, Gaurav
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 219
  • [6] Cross-modal semantic priming
    Tabossi, P
    [J]. LANGUAGE AND COGNITIVE PROCESSES, 1996, 11 (06): : 569 - 576
  • [7] Cross-Modal Semantic Communications
    Li, Ang
    Wei, Xin
    Wu, Dan
    Zhou, Liang
    [J]. IEEE WIRELESS COMMUNICATIONS, 2022, 29 (06) : 144 - 151
  • [8] Cross-Modal Collaborative Communications
    Zhou, Liang
    Wu, Dan
    Chen, Jianxin
    Wei, Xin
    [J]. IEEE WIRELESS COMMUNICATIONS, 2020, 27 (02) : 112 - 117
  • [9] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
    Han, De
    Cheng, Xing
    Guo, Nan
    Ye, Xiaochun
    Rainer, Benjamin
    Priller, Peter
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
  • [10] Modal-adversarial Semantic Learning Network for Extendable Cross-modal Retrieval
    Xu, Xing
    Song, Jingkuan
    Lu, Huimin
    Yang, Yang
    Shen, Fumin
    Huang, Zi
    [J]. ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 46 - 54