Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning

被引:42
|
作者
Zheng, Minghang [1 ]
Huang, Yanjie [1 ]
Chen, Qingchao [2 ]
Peng, Yuxin [1 ]
Liu, Yang [1 ,3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China
[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal sentence grounding aims to detect the most salient moment corresponding to the natural language query from untrimmed videos. As labeling the temporal boundaries is labor-intensive and subjective, the weakly-supervised methods have recently received increasing attention. Most of the existing weakly-supervised methods generate the proposals by sliding windows, which are content-independent and of low quality. Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning(CPL) to overcome the above limitations. Specifically, we use multiple learnable Gaussian functions to generate both positive and negative proposals within the same video that can characterize the multiple events in a long video. Then, we propose a controllable easy to hard negative proposal mining strategy to collect negative samples within the same video, which can ease the model optimization and enables CPL to distinguish highly confusing scenes. The experiments show that our method achieves state-of-the-art performance on Charades-STA and ActivityNet Captions datasets. The code and models are available at https://github.com/minghangz/cpl.
引用
收藏
页码:15534 / 15543
页数:10
相关论文
共 50 条
  • [1] Counterfactual contrastive learning for weakly supervised temporal sentence grounding
    Xu, Yenan
    Xu, Wanru
    Miao, Zhenjiang
    NEUROCOMPUTING, 2025, 624
  • [2] Contrastive Perturbation Network for Weakly Supervised Temporal Sentence Grounding
    Han, Tingting
    Lv, Yuanxin
    Yu, Zhou
    Yu, Jun
    Fan, Jianping
    Yuan, Liu
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 446 - 460
  • [3] Adaptive proposal network based on generative adversarial learning for weakly supervised temporal sentence grounding
    Wang, Weikang
    Su, Yuting
    Liu, Jing
    Jing, Peiguang
    PATTERN RECOGNITION LETTERS, 2024, 179 : 9 - 16
  • [4] Local Correspondence Network for Weakly Supervised Temporal Sentence Grounding
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3252 - 3262
  • [5] Weakly Supervised Temporal Action Localization Based on Contrastive Learning
    Hou Y.
    Li Y.
    Guo Z.
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2023, 56 (01): : 73 - 80
  • [6] Atomic-action-based Contrastive Network for Weakly Supervised Temporal Language Grounding
    Wu, Hongzhou
    Lyu, Yifan
    Shen, Xingyu
    Zhao, Xuechen
    Wang, Mengzhu
    Zhang, Xiang
    Luo, Zhigang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1523 - 1528
  • [7] Reinforcement Learning with Multi-Policy Movement Strategy for Weakly Supervised Temporal Sentence Grounding
    Jiang, Shan
    Kong, Yuqiu
    Zhang, Lihe
    Yin, Baocai
    APPLIED SCIENCES-BASEL, 2024, 14 (21):
  • [8] Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding
    Tang, Kefan
    He, Lihuo
    Wang, Nannan
    Gao, Xinbo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 95 - 107
  • [9] Query-aware multi-scale proposal network for weakly supervised temporal sentence grounding in videos
    Zhou, Mingyao
    Chen, Wenjing
    Sun, Hao
    Xie, Wei
    Dong, Ming
    Lu, Xiaoqiang
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [10] Weakly Supervised Contrastive Learning
    Zheng, Mingkai
    Wang, Fei
    You, Shan
    Qian, Chen
    Zhang, Changshui
    Wang, Xiaogang
    Xu, Chang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10022 - 10031