Spatiotemporal contrastive modeling for video moment retrieval

被引:1
|
作者
Wang, Yi [1 ,2 ]
Li, Kun [1 ,2 ]
Chen, Guoliang [1 ,2 ]
Zhang, Yan [1 ,2 ]
Guo, Dan [1 ,2 ]
Wang, Meng [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Artificial Intelligence, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Video moment retrieval; Spatiotemporal modeling; Contrastive learning; Language query; Temporal localization; ACTION RECOGNITION;
D O I
10.1007/s11280-022-01105-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of social networks, video data has been growing explosively. As one of the important social mediums, spatiotemporal characteristics of videos have attracted considerable attention in recommendation system and video understanding. In this paper, we discuss the video moment retrieval (VMR) task, which locates moments in a video based on different textual queries. Existing methods are of two pipelines: 1) proposal-free approaches are mainly in modifying multi-modal interaction strategy; 2) proposal-based methods are dedicated to designing advanced proposal generation paradigm. Recently, contrastive representation learning has been successfully applied to the field of video understanding. From a new perspective, we propose a new VMR framework, named spatiotemporal contrastive network (STCNet), to learn discriminative boundary features of video grounding by contrast learning. To be specific, we propose a boundary matching sampling module for dense negative sample sampling. The contrast learning can refine the feature representations in the training phase without any additional cost in inference. On three public datasets, Charades-STA, ActivityNet Captions and TACoS, our proposed method performs competitive performance.
引用
收藏
页码:1525 / 1544
页数:20
相关论文
共 50 条
  • [31] Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval
    Wang, Jinpeng
    Chen, Bin
    Liao, Dongliang
    Zeng, Ziyun
    Li, Gongfu
    Xia, Shu-Tao
    Xu, Jin
    PROCEEDINGS OF THE ACM WEB CONFERENCE 2022 (WWW'22), 2022, : 3020 - 3030
  • [32] Expert-guided contrastive learning for video-text retrieval
    Lee, Jewook
    Lee, Pilhyeon
    Park, Sungho
    Byun, Hyeran
    NEUROCOMPUTING, 2023, 536 : 50 - 58
  • [33] Cross-Modal Contrastive Hashing Retrieval for Infrared Video and EEG
    Han, Jianan
    Zhang, Shaoxing
    Men, Aidong
    Chen, Qingchao
    SENSORS, 2022, 22 (22)
  • [34] Semantic Relevance Learning for Video-Query Based Video Moment Retrieval
    Huo, Shuwei
    Zhou, Yuan
    Wang, Ruolin
    Xiang, Wei
    Kung, Sun-Yuan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9290 - 9301
  • [35] Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering
    Luo, Hongyin
    Mohtarami, Mitra
    Glass, James
    Krishnanzurthy, Karthik
    Richardson, Brigitte
    INTERSPEECH 2019, 2019, : 599 - 603
  • [36] Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval
    Kumar, Vidit
    Tripathi, Vikas
    Pant, Bhaskar
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2022, 7 (02) : 272 - 287
  • [37] Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning
    Zhu, Minghao
    Lin, Xiao
    Dang, Ronghao
    Liu, Chengju
    Chen, Qijun
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4725 - 4736
  • [38] Mining spatiotemporal video patterns towards robust action retrieval
    Cao, Liujuan
    Ji, Rongrong
    Gao, Yue
    Liu, Wei
    Tian, Qi
    NEUROCOMPUTING, 2013, 105 : 61 - 69
  • [39] Spatiotemporal retrieval of dynamic video object trajectories in geographical scenes
    Xie, Yujia
    Wang, Meizhen
    Liu, Xuejun
    Wang, Ziran
    Mao, Bo
    Wang, Feiyue
    Wang, Xiaozhi
    TRANSACTIONS IN GIS, 2021, 25 (01) : 450 - 467
  • [40] Moment is Important: Language-Based Video Moment Retrieval via Adversarial Learning
    Zeng Y.
    Cao D.
    Lu S.
    Zhang H.
    Xu J.
    Qin Z.
    ACM Transactions on Multimedia Computing, Communications and Applications, 2022, 18 (02)