BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation

被引:0
|
作者
Mun, Jonghwan [1 ]
Shin, Minchul [1 ]
Han, Gunsoo [1 ]
Lee, Sangho [2 ]
Ha, Seongsu [2 ]
Lee, Joonseok [2 ]
Kim, Eun-Sol [3 ]
机构
[1] Kakao Brain, Seongnam, South Korea
[2] Seoul Natl Univ, Grad Sch Data Sci, Seoul, South Korea
[3] Hanyang Univ, Dept Comp Sci, Seoul, South Korea
来源
关键词
Video scene segmentation; Self-supervised learning;
D O I
10.1007/978-3-031-26316-3_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning has drawn attention through its effectiveness in learning in-domain representations with no ground-truth annotations; in particular, it is shown that properly designed pretext tasks bring significant performance gains for downstream tasks. Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a long video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks. In our framework, given a long video, we adopt a sliding window scheme; from a sequence of shots in each window, we discover a moment with a maximum semantic transition and leverage it as pseudo-boundary to facilitate the pre-training. Specifically, we introduce three novel boundary-aware pretext tasks: 1) Shot-Scene Matching (SSM), 2) Contextual Group Matching (CGM) and 3) Pseudo-boundary Prediction (PP); SSM and CGM guide the model to maximize intra-scene similarity and inter-scene discrimination by capturing contextual relation between shots while PP encourages the model to identify transitional moments. We perform an extensive analysis to validate effectiveness of our method and achieve the new state-of-the-art on the MovieNet-SSeg benchmark. The code is available at https://github.com/kakaobrain/bassl.
引用
收藏
页码:485 / 501
页数:17
相关论文
共 50 条
  • [1] Boundary-aware information maximization for self-supervised medical image segmentation
    Peng, Jizong
    Wang, Ping
    Pedersoli, Marco
    Desrosiers, Christian
    [J]. MEDICAL IMAGE ANALYSIS, 2024, 94
  • [2] Boundary-Aware Feature Propagation for Scene Segmentation
    Ding, Henghui
    Jiang, Xudong
    Liu, Ai Qun
    Thalmann, Nadia Magnenat
    Wang, Gang
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6818 - 6828
  • [3] Temporal Scene Montage for Self-Supervised Video Scene Boundary Detection
    Tan, Jiawei
    Yang, Pingan
    Chen, Lu
    Wang, Hongxing
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (07)
  • [4] VIDEO SEGMENTATION VIA BOUNDARY-AWARE FLOW
    Chen, Ding-Jie
    Chen, Hwann-Tzong
    Chang, Long-Wen
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3340 - 3344
  • [5] COLOR-AWARE SELF-SUPERVISED LEARNING FOR SCENE CLASSIFICATION AND SEGMENTATION OF REMOTE SENSING IMAGES
    Xu, Guozheng
    Jiang, Xue
    Liu, Xingzhao
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5049 - 5052
  • [6] Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
    Chen, Shixing
    Nie, Xiaohan
    Fan, David
    Zhang, Dongqing
    Bhat, Vimal
    Hamid, Raffay
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9791 - 9800
  • [7] Learning disentangled representation for self-supervised video object segmentation
    Hou, Wenjie
    Qin, Zheyun
    Xi, Xiaoming
    Lu, Xiankai
    Yin, Yilong
    [J]. NEUROCOMPUTING, 2022, 481 : 270 - 280
  • [8] Learning disentangled representation for self-supervised video object segmentation
    Hou, Wenjie
    Qin, Zheyun
    Xi, Xiaoming
    Lu, Xiankai
    Yin, Yilong
    [J]. Neurocomputing, 2022, 481 : 270 - 280
  • [9] Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning
    Assefa, Maregu
    Jiang, Wei
    Alemu, Kumie Gedamu
    Yilma, Getinet
    Adhikari, Deepak
    Ayalew, Melese
    Seid, Abegaz Mohammed
    Erbad, Aiman
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (11) : 6679 - 6692
  • [10] A Boundary-aware Distillation Network for Compressed Video Semantic Segmentation
    Lu, Hongchao
    Deng, Zhidong
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5354 - 5359