Towards Global Video Scene Segmentation with Context-Aware Transformer

被引:0
|
作者
Yang, Yang [1 ,2 ,3 ]
Huang, Yurui [1 ]
Guo, Weili [1 ]
Xu, Baohua [4 ]
Xia, Dingyin
机构
[1] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
[2] NUAA, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing, Peoples R China
[3] NJU, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[4] HUAWEI CBG Edu Lab, Montreal, PQ, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Videos such as movies or TV episodes usually need to divide the long storyline into cohesive units, i.e., scenes, to facilitate the understanding of video semantics. The key challenge lies in finding the boundaries of scenes by comprehensively considering the complex temporal structure and semantic in-formation. To this end, we introduce a novel Context-Aware Transformer (CAT) with a self-supervised learning framework to learn high-quality shot representations, for generating well-bounded scenes. More specifically, we design the CAT with local-global self-attentions, which can effectively consider both the long-term and short-term context to improve the shot encoding. For training the CAT, we adopt the self-supervised learning schema. Firstly, we leverage shot-to-scene level pretext tasks to facilitate the pre-training with pseudo boundary, which guides CAT to learn the discriminative shot representations that maximize intra-scene similarity and inter-scene discrimination in an unsupervised manner. Then, we transfer contextual representations for fine-tuning the CAT with supervised data, which encourages CAT to accurately detect the boundary for scene segmentation. As a result, CAT is able to learn the context-aware shot representations and provides global guidance for scene segmentation. Our empirical analyses show that CAT can achieve state-of-the-art performance when conducting the scene segmentation task on the MovieNet dataset, e.g., offering 2.15 improvements on AP.
引用
收藏
页码:3206 / 3213
页数:8
相关论文
共 50 条
  • [1] Context-aware Deformable Alignment for Video Object Segmentation
    Yang, Jie
    Xia, Mingfu
    Zhou, Xue
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 303 - 309
  • [2] Local-Global Context Aware Transformer for Language-Guided Video Segmentation
    Liang, Chen
    Wang, Wenguan
    Zhou, Tianfei
    Miao, Jiaxu
    Luo, Yawei
    Yang, Yi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10055 - 10069
  • [3] CDText: Scene text detector based on context-aware deformable transformer
    Wu, Yirui
    Kong, Qiran
    Yong, Lai
    Narducci, Fabio
    Wan, Shaohua
    PATTERN RECOGNITION LETTERS, 2023, 172 : 8 - 14
  • [4] Context-aware and local-aware fusion with transformer for medical image segmentation
    Xiao, Hanguang
    Li, Li
    Liu, Qiyuan
    Zhang, Qihang
    Liu, Junqi
    Liu, Zhi
    PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (02):
  • [5] Survival Analysis based on Lung Tumor Segmentation using Global Context-aware Transformer in Multimodality
    Dao, Duy-Phuong
    Yang, Hyung-Jeong
    Ho, Ngoc-Huynh
    Pant, Sudarshan
    Kim, Soo-Hyung
    Lee, Guee-Sang
    Oh, In-Jae
    Kang, Sae-Ryung
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 5162 - 5169
  • [6] Adjustable Context-Aware Transformer
    Koohfar, Sepideh
    Dietz, Laura
    ADVANCED ANALYTICS AND LEARNING ON TEMPORAL DATA, AALTD 2022, 2023, 13812 : 3 - 17
  • [7] Towards realizing global scalability in context-aware systems
    Buchholz, T
    Linnhoff-Popien, C
    LOCATION- AND CONTEXT-AWARENESS, PROCEEDINGS, 2005, 3479 : 26 - 39
  • [8] Multi global context-aware transformer for ship name recognition in IoT
    Xian, Yunting
    Lu, Lu
    Qiu, Xuanrui
    Xian, Jing
    IET COMMUNICATIONS, 2025, 19 (01)
  • [9] Context-aware transformer for image captioning
    Yang, Xin
    Wang, Ying
    Chen, Haishun
    Li, Jie
    Huang, Tingting
    NEUROCOMPUTING, 2023, 549
  • [10] DPCTN: Dual path context-aware transformer network for medical image segmentation
    Song, Pengfei
    Yang, Zhe
    Li, Jinjiang
    Fan, Hui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124