Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning

被引:42
|
作者
Zheng, Minghang [1 ]
Huang, Yanjie [1 ]
Chen, Qingchao [2 ]
Peng, Yuxin [1 ]
Liu, Yang [1 ,3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China
[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal sentence grounding aims to detect the most salient moment corresponding to the natural language query from untrimmed videos. As labeling the temporal boundaries is labor-intensive and subjective, the weakly-supervised methods have recently received increasing attention. Most of the existing weakly-supervised methods generate the proposals by sliding windows, which are content-independent and of low quality. Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning(CPL) to overcome the above limitations. Specifically, we use multiple learnable Gaussian functions to generate both positive and negative proposals within the same video that can characterize the multiple events in a long video. Then, we propose a controllable easy to hard negative proposal mining strategy to collect negative samples within the same video, which can ease the model optimization and enables CPL to distinguish highly confusing scenes. The experiments show that our method achieves state-of-the-art performance on Charades-STA and ActivityNet Captions datasets. The code and models are available at https://github.com/minghangz/cpl.
引用
收藏
页码:15534 / 15543
页数:10
相关论文
共 50 条
  • [31] Weakly supervised pathological whole slide image classification based on contrastive learning
    Xie, Yining
    Long, Jun
    Hou, Jianxin
    Chen, Deyun
    Guan, Guohui
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (21) : 60809 - 60831
  • [32] Weakly supervised histopathological image representation learning based on contrastive dynamic clustering
    Li, Jun
    Jiang, Zhiguo
    Zheng, Yushan
    Zhang, Haopeng
    Shi, Jun
    Hu, Dingyi
    Luo, Wei
    Jiang, Zhongmin
    Xue, Chenghai
    MEDICAL IMAGING 2022: DIGITAL AND COMPUTATIONAL PATHOLOGY, 2022, 12039
  • [33] Grouped Contrastive Learning of Self-Supervised Sentence Representation
    Wang, Qian
    Zhang, Weiqi
    Lei, Tianyi
    Peng, Dezhong
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [34] Boundary-Aware Temporal Sentence Grounding with Adaptive Proposal Refinement
    Dong, Jianxiang
    Yin, Zhaozheng
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 641 - 657
  • [35] INVESTIGATING POOLING STRATEGIES AND LOSS FUNCTIONS FOR WEAKLY-SUPERVISED TEXT-TO-AUDIO GROUNDING VIA CONTRASTIVE LEARNING
    Xu, Xuenan
    Wu, Mengyue
    Yu, Kai
    2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW, 2023,
  • [36] Inverse Compositional Learning for Weakly-supervised Relation Grounding
    Li, Huan
    Wei, Ping
    Ma, Zeyu
    Zheng, Nanning
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15431 - 15441
  • [37] A Dual Reinforcement Learning Framework for Weakly Supervised Phrase Grounding
    Wang, Zhiyu
    Yang, Chao
    Jiang, Bin
    Yuan, Junsong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 394 - 405
  • [38] Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding
    Mo, Shentong
    Liu, Daizong
    Hu, Wei
    arXiv, 2022,
  • [39] Weakly-Supervised Contrastive Learning for Unsupervised Object Discovery
    Lv, Yunqiu
    Zhang, Jing
    Barnes, Nick
    Dai, Yuchao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2689 - 2702
  • [40] Consistent prototype contrastive learning for weakly supervised person search
    Lin, Huadong
    Yu, Xiaohan
    Zhang, Pengcheng
    Bai, Xiao
    Zheng, Jin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 105