Spatiotemporal contrastive modeling for video moment retrieval

被引:1
|
作者
Wang, Yi [1 ,2 ]
Li, Kun [1 ,2 ]
Chen, Guoliang [1 ,2 ]
Zhang, Yan [1 ,2 ]
Guo, Dan [1 ,2 ]
Wang, Meng [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230601, Anhui, Peoples R China
[2] Hefei Univ Technol, Sch Artificial Intelligence, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Video moment retrieval; Spatiotemporal modeling; Contrastive learning; Language query; Temporal localization; ACTION RECOGNITION;
D O I
10.1007/s11280-022-01105-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of social networks, video data has been growing explosively. As one of the important social mediums, spatiotemporal characteristics of videos have attracted considerable attention in recommendation system and video understanding. In this paper, we discuss the video moment retrieval (VMR) task, which locates moments in a video based on different textual queries. Existing methods are of two pipelines: 1) proposal-free approaches are mainly in modifying multi-modal interaction strategy; 2) proposal-based methods are dedicated to designing advanced proposal generation paradigm. Recently, contrastive representation learning has been successfully applied to the field of video understanding. From a new perspective, we propose a new VMR framework, named spatiotemporal contrastive network (STCNet), to learn discriminative boundary features of video grounding by contrast learning. To be specific, we propose a boundary matching sampling module for dense negative sample sampling. The contrast learning can refine the feature representations in the training phase without any additional cost in inference. On three public datasets, Charades-STA, ActivityNet Captions and TACoS, our proposed method performs competitive performance.
引用
收藏
页码:1525 / 1544
页数:20
相关论文
共 50 条
  • [1] Spatiotemporal contrastive modeling for video moment retrieval
    Yi Wang
    Kun Li
    Guoliang Chen
    Yan Zhang
    Dan Guo
    Meng Wang
    World Wide Web, 2023, 26 : 1525 - 1544
  • [2] Video Moment Retrieval with Hierarchical Contrastive Learning
    Zhang, Bolin
    Yang, Chao
    Jiang, Bin
    Zhou, Xiaokang
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [3] Video Corpus Moment Retrieval with Contrastive Learning
    Zhang, Hao
    Sun, Aixin
    Jing, Wei
    Nan, Guoshun
    Zhen, Liangli
    Zhou, Joey Tianyi
    Goh, Rick Siow Mong
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 685 - 695
  • [4] Momentum Cross-Modal Contrastive Learning for Video Moment Retrieval
    Han, De
    Cheng, Xing
    Guo, Nan
    Ye, Xiaochun
    Rainer, Benjamin
    Priller, Peter
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5977 - 5994
  • [5] Hybrid Spatiotemporal Contrastive Representation Learning for Content-Based Surgical Video Retrieval
    Kumar, Vidit
    Tripathi, Vikas
    Pant, Bhaskar
    Alshamrani, Sultan S.
    Dumka, Ankur
    Gehlot, Anita
    Singh, Rajesh
    Rashid, Mamoon
    Alshehri, Abdullah
    AlGhamdi, Ahmed Saeed
    ELECTRONICS, 2022, 11 (09)
  • [6] Spatiotemporal Contrastive Video Representation Learning
    Qian, Rui
    Meng, Tianjian
    Gong, Boqing
    Yang, Ming-Hsuan
    Wang, Huisheng
    Belongie, Serge
    Cui, Yin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6960 - 6970
  • [7] Adversarial Video Moment Retrieval by Jointly Modeling Ranking and Localization
    Cao, Da
    Zeng, Yawen
    Wei, Xiaochi
    Nie, Liqiang
    Hong, Richang
    Qin, Zheng
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 898 - 906
  • [8] Fast Video Moment Retrieval
    Gao, Junyu
    Xu, Changsheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1503 - 1512
  • [9] Survey on Video Moment Retrieval
    Wang Y.
    Zhan Y.-W.
    Luo X.
    Liu M.
    Xu X.-S.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (02): : 985 - 1006
  • [10] Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval
    Panta, Love
    Shrestha, Prashant
    Sapkota, Brabeem
    Bhattarai, Amrita
    Manandhar, Suresh
    Sah, Anand Kumar
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 617 - 624