UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection

被引:5
|
作者
Kang, Hyolim [1 ]
Kim, Jinwoo [1 ]
Kim, Taehyun [1 ]
Kim, Seon Joo [1 ]
机构
[1] Yonsei Univ, Seoul, South Korea
关键词
D O I
10.1109/CVPR52688.2022.01944
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Generic Event Boundary Detection (GEBD) is a newly suggested video understanding task that aims to find one level deeper semantic boundaries of events. Bridging the gap between natural human perception and video understanding, it has various potential applications, including interpretable and semantically valid video parsing. Still at an early development stage, existing GEBD solvers are simple extensions of relevant video understanding tasks, disregarding GEBD's distinctive characteristics. In this paper, we propose a novel framework for unsupervised/supervised GEBD, by using the Temporal Self-similarity Matrix (TSM) as the video representation. The new Recursive TSM Parsing (RTP) algorithm exploits local diagonal patterns in TSM to detect boundaries, and it is combined with the Boundary Contrastive (BoCo) loss to train our encoder to generate more informative TSMs. Our framework can be applied to both unsupervised and supervised settings, with both achieving state-of-the-art performance by a huge margin in GEBD benchmark. Especially, our unsupervised method outperforms the previous state-of-the-art "supervised" model, implying its exceptional efficacy.
引用
收藏
页码:20041 / 20050
页数:10
相关论文
共 50 条
  • [1] Generic Event Boundary Detection: A Benchmark for Event Segmentation
    Shou, Mike Zheng
    Lei, Stan Weixian
    Wang, Weiyao
    Ghadiyaram, Deepti
    Feiszli, Matt
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8055 - 8064
  • [2] Local Compressed Video Stream Learning for Generic Event Boundary Detection
    Libo Zhang
    Xin Gu
    Congcong Li
    Tiejian Luo
    Heng Fan
    [J]. International Journal of Computer Vision, 2024, 132 : 1187 - 1204
  • [3] Local Compressed Video Stream Learning for Generic Event Boundary Detection
    Zhang, Libo
    Gu, Xin
    Li, Congcong
    Luo, Tiejian
    Fan, Heng
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (04) : 1187 - 1204
  • [4] End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection
    Li, Congcong
    Wang, Xinyao
    Wen, Longyin
    Hong, Dexiang
    Luo, Tiejian
    Zhang, Libo
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13947 - 13956
  • [5] UNSUPERVISED CONTRASTIVE LEARNING OF SOUND EVENT REPRESENTATIONS
    Fonseca, Eduardo
    Ortego, Diego
    McGuinness, Kevin
    O'Connor, Noel E.
    Serra, Xavier
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 371 - 375
  • [6] Motion Aware Self-Supervision for Generic Event Boundary Detection
    Rai, Ayush K.
    Krishna, Tarun
    Dietlmeier, Julia
    McGuinness, Kevin
    Smeaton, Alan F.
    O'Connor, Noel E.
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2727 - 2738
  • [7] Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
    Chen, Shixing
    Nie, Xiaohan
    Fan, David
    Zhang, Dongqing
    Bhat, Vimal
    Hamid, Raffay
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9791 - 9800
  • [8] DetCo: Unsupervised Contrastive Learning for Object Detection
    Xie, Enze
    Ding, Jian
    Wang, Wenhai
    Zhan, Xiaohang
    Xu, Hang
    Sun, Peize
    Li, Zhenguo
    Luo, Ping
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8372 - 8381
  • [9] Contrastive Learning for Unsupervised Video Highlight Detection
    Badamdorj, Taivanbat
    Rochan, Mrigank
    Wang, Yang
    Cheng, Li
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14022 - 14032
  • [10] Unsupervised multilingual sentence boundary detection
    Kiss, Tibor
    Strunk, Jan
    [J]. COMPUTATIONAL LINGUISTICS, 2006, 32 (04) : 485 - 525