Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning

被引:42
|
作者
Zheng, Minghang [1 ]
Huang, Yanjie [1 ]
Chen, Qingchao [2 ]
Peng, Yuxin [1 ]
Liu, Yang [1 ,3 ]
机构
[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing, Peoples R China
[2] Peking Univ, Natl Inst Hlth Data Sci, Beijing, Peoples R China
[3] Beijing Inst Gen Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01511
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Temporal sentence grounding aims to detect the most salient moment corresponding to the natural language query from untrimmed videos. As labeling the temporal boundaries is labor-intensive and subjective, the weakly-supervised methods have recently received increasing attention. Most of the existing weakly-supervised methods generate the proposals by sliding windows, which are content-independent and of low quality. Moreover, they train their model to distinguish positive visual-language pairs from negative ones randomly collected from other videos, ignoring the highly confusing video segments within the same video. In this paper, we propose Contrastive Proposal Learning(CPL) to overcome the above limitations. Specifically, we use multiple learnable Gaussian functions to generate both positive and negative proposals within the same video that can characterize the multiple events in a long video. Then, we propose a controllable easy to hard negative proposal mining strategy to collect negative samples within the same video, which can ease the model optimization and enables CPL to distinguish highly confusing scenes. The experiments show that our method achieves state-of-the-art performance on Charades-STA and ActivityNet Captions datasets. The code and models are available at https://github.com/minghangz/cpl.
引用
收藏
页码:15534 / 15543
页数:10
相关论文
共 50 条
  • [41] An ensemble learning algorithm with Gaussian-based oversampling
    Zhang Z.
    Chen Y.
    Tang J.
    Luo X.
    Xitong Gongcheng Lilun yu Shijian/System Engineering Theory and Practice, 2021, 41 (02): : 513 - 523
  • [42] Transform-Equivariant Consistency Learning for Temporal Sentence Grounding
    Liu, Daizong
    Qu, Xiaoye
    Dong, Jianfeng
    Zhou, Pan
    Xu, Zichuan
    Wang, Haozhao
    Di, Xing
    Lu, Weining
    Cheng, Yu
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [43] Multi-Scale Contrastive Learning based Weakly Supervised Learning for Remote Sensing Scene Classification
    Peng, Rui
    Zhao, Wenzhi
    Zhang, Liqiang
    Chen, Xuehong
    Journal of Geo-Information Science, 2022, 24 (07) : 1375 - 1390
  • [44] Weakly Supervised Temporal Action Detection With Temporal Dependency Learning
    Li, Bairong
    Liu, Ruixin
    Chen, Tianquan
    Zhu, Yuesheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4473 - 4485
  • [45] Rethinking Weakly-Supervised Video Temporal Grounding From a Game Perspective
    Fang, Xiang
    Xiong, Zeyu
    Fang, Wanlong
    Qu, Xiaoye
    Chen, Chen
    Dong, Jianfeng
    Tang, Keke
    Zhou, Pan
    Cheng, Yu
    Liu, Daizong
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 290 - 311
  • [46] Negative Prototypes Guided Contrastive Learning for Weakly Supervised Object Detection
    Zhang, Yu
    Zhu, Chuang
    Yang, Guoqing
    Chen, Siqi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 36 - 51
  • [47] Instance-Level Contrastive Learning for Weakly Supervised Object Detection
    Zhang, Ming
    Zeng, Bing
    SENSORS, 2022, 22 (19)
  • [48] Self-Supervised Learning for Semi-Supervised Temporal Language Grounding
    Luo, Fan
    Chen, Shaoxiang
    Chen, Jingjing
    Wu, Zuxuan
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7747 - 7757
  • [49] Object Discovery via Contrastive Learning for Weakly Supervised Object Detection
    Seo, Jinhwan
    Bae, Wonho
    Sutherland, Danica J.
    Noh, Junhyug
    Kim, Daijin
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 312 - 329
  • [50] Weakly-Supervised Positional Contrastive Learning: Application to Cirrhosis Classification
    Sarfati, Emma
    Bone, Alexandre
    Rohe, Marc-Michel
    Gori, Pietro
    Bloch, Isabelle
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 227 - 237