Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning

被引:42
|
作者
Zheng, Minghang [1 ]
Huang, Yanjie [1 ]
Chen, Qingchao [2 ]
Peng, Yuxin [1 ]
Liu, Yang [1 ,3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China
[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal sentence grounding aims to detect the most salient moment corresponding to the natural language query from untrimmed videos. As labeling the temporal boundaries is labor-intensive and subjective, the weakly-supervised methods have recently received increasing attention. Most of the existing weakly-supervised methods generate the proposals by sliding windows, which are content-independent and of low quality. Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning(CPL) to overcome the above limitations. Specifically, we use multiple learnable Gaussian functions to generate both positive and negative proposals within the same video that can characterize the multiple events in a long video. Then, we propose a controllable easy to hard negative proposal mining strategy to collect negative samples within the same video, which can ease the model optimization and enables CPL to distinguish highly confusing scenes. The experiments show that our method achieves state-of-the-art performance on Charades-STA and ActivityNet Captions datasets. The code and models are available at https://github.com/minghangz/cpl.
引用
收藏
页码:15534 / 15543
页数:10
相关论文
共 50 条
  • [21] Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization
    Gao, Junyu
    Chen, Mengyuan
    Xu, Changsheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19967 - 19977
  • [22] Weakly Supervised Temporal Adjacent Network for Language Grounding
    Wang, Yuechen
    Deng, Jiajun
    Zhou, Wengang
    Li, Houqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 24 : 3276 - 3286
  • [23] Iterative Proposal Refinement for Weakly-Supervised Video Grounding
    School of Electronic and Computer Engineering, Peking University, China
    不详
    不详
    不详
    Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, (6524-6534): : 6524 - 6534
  • [24] Proposal-based Multiple Instance Learning for Weakly-supervised Temporal Action Localization
    Ren, Huan
    Yang, Wenfei
    Zhang, Tianzhu
    Zhang, Yongdong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2394 - 2404
  • [25] Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video
    Chen, Zhenfang
    Ma, Lin
    Luo, Wenhan
    Wong, Kwan-Yee K.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1884 - 1894
  • [26] Contrastive Proposal Extension With LSTM Network for Weakly Supervised Object Detection
    Lv, Pei
    Hu, Suqi
    Hao, Tianran
    IEEE Transactions on Image Processing, 2022, 31 : 6879 - 6892
  • [27] Contrastive Proposal Extension With LSTM Network for Weakly Supervised Object Detection
    Lv, Pei
    Hu, Suqi
    Hao, Tianran
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6879 - 6892
  • [28] Weakly Supervised Contrastive Learning for Unsupervised Vehicle Reidentification
    Yu, Jongmin
    Oh, Hyeontaek
    Kim, Minkyung
    Kim, Junsik
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (11) : 15543 - 15553
  • [29] Weakly Supervised Contrastive Learning for Unsupervised Vehicle Reidentification
    Yu, Jongmin
    Oh, Hyeontaek
    Kim, Minkyung
    Kim, Junsik
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (11) : 1 - 11
  • [30] Actionness Inconsistency-Guided Contrastive Learning for Weakly-Supervised Temporal Action Localization
    Li, Zhilin
    Wang, Zilei
    Liu, Qinying
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 1513 - 1521