Towards Global Video Scene Segmentation with Context-Aware Transformer

被引:0
|
作者
Yang, Yang [1 ,2 ,3 ]
Huang, Yurui [1 ]
Guo, Weili [1 ]
Xu, Baohua [4 ]
Xia, Dingyin
机构
[1] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
[2] NUAA, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing, Peoples R China
[3] NJU, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[4] HUAWEI CBG Edu Lab, Montreal, PQ, Canada
来源
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Videos such as movies or TV episodes usually need to divide the long storyline into cohesive units, i.e., scenes, to facilitate the understanding of video semantics. The key challenge lies in finding the boundaries of scenes by comprehensively considering the complex temporal structure and semantic in-formation. To this end, we introduce a novel Context-Aware Transformer (CAT) with a self-supervised learning framework to learn high-quality shot representations, for generating well-bounded scenes. More specifically, we design the CAT with local-global self-attentions, which can effectively consider both the long-term and short-term context to improve the shot encoding. For training the CAT, we adopt the self-supervised learning schema. Firstly, we leverage shot-to-scene level pretext tasks to facilitate the pre-training with pseudo boundary, which guides CAT to learn the discriminative shot representations that maximize intra-scene similarity and inter-scene discrimination in an unsupervised manner. Then, we transfer contextual representations for fine-tuning the CAT with supervised data, which encourages CAT to accurately detect the boundary for scene segmentation. As a result, CAT is able to learn the context-aware shot representations and provides global guidance for scene segmentation. Our empirical analyses show that CAT can achieve state-of-the-art performance when conducting the scene segmentation task on the MovieNet dataset, e.g., offering 2.15 improvements on AP.
引用
收藏
页码:3206 / 3213
页数:8
相关论文
共 50 条
  • [21] CONTEXT-AWARE TRANSFORMER TRANSDUCER FOR SPEECH RECOGNITION
    Chang, Feng-Ju
    Liu, Jing
    Radfar, Martin
    Mouchtaris, Athanasios
    Omologo, Maurizio
    Rastrow, Ariya
    Kunzmann, Siegfried
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 503 - 510
  • [22] CONTEXT-AWARE HIERARCHICAL TRANSFORMER FOR FINE-GRAINED VIDEO-TEXT RETRIEVAL
    Chen, Mingliang
    Zhang, Weimin
    Ren, Yurui
    Li, Ge
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 386 - 390
  • [23] Towards Long-Range Pixels Connection for Context-Aware Semantic Segmentation
    Khan, Muhammad Zubair
    Lee, Yugyung
    Khan, Muazzam A.
    Munir, Arslan
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [24] Towards Context-Aware Task Recommendation
    Vo, Chuong Cong
    Torabi, Torab
    Loke, Seng W.
    JCPC: 2009 JOINT CONFERENCE ON PERVASIVE COMPUTING, 2009, : 289 - 292
  • [25] Towards context-aware similarity metrics
    Morent, D
    Patterson, DE
    Berthold, MR
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 5596 - 5598
  • [26] Towards Context-Aware Behaviour Generation
    de Sousa Duarte, Paulo Artur
    Barreto, Felipe Mota
    de Almada Gomes, Francisco Anderson
    de Carvalho, Windson Viana
    Mota Trinta, Fernando Antonio
    30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, : 596 - 598
  • [27] Towards Context-aware Deployment and Reconfiguration
    Hammami, Amir
    Villemur, Thierry
    Guerout, Tom
    2013 IEEE 22ND INTERNATIONAL WORKSHOP ON ENABLING TECHNOLOGIES: INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2013, : 86 - 91
  • [28] Towards context-aware user modeling
    Samulowitz, M
    TRENDS IN DISTRIBUTED SYSTEMS: TOWARDS A UNIVERSAL SERVICE MARKET, 2000, 1890 : 272 - 277
  • [29] Towards context-aware web applications
    Chang, Po-Hao
    Agha, Gul
    DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, PROCEEDINGS, 2007, 4531 : 239 - +
  • [30] Towards context-aware transaction services
    Rouvoy, Romain
    Serrano-Alvarado, Patricia
    Merle, Philippe
    DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, PROCEEDINGS, 2006, 4025 : 272 - 288