Global and Compact Video Context Embedding for Video Semantic Segmentation

被引:0
|
作者
Sun, Lei [1 ,2 ]
Liu, Yun [3 ]
Sun, Guolei [2 ]
Wu, Min [3 ]
Xu, Zhijie [4 ]
Wang, Kaiwei [1 ]
Van Gool, Luc [2 ]
机构
[1] Zhejiang Univ, Natl Res Ctr Opt Instrumentat, Hangzhou 310027, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[4] Univ Huddersfield, Ctr Visual & Immers Comp, Huddersfield HD1 3DH, England
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Context modeling; Feature extraction; Computational modeling; Sun; Optical flow; Shape; Video semantic segmentation; global video context; compact video context; video context embedding; NETWORK;
D O I
10.1109/ACCESS.2024.3409150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE). With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.
引用
收藏
页码:135589 / 135600
页数:12
相关论文
共 50 条
  • [31] Improved semantic video object segmentation algorithm
    Ren, He
    Hua, Chazhen
    Jisuanji Gongcheng/Computer Engineering, 2002, 28 (08):
  • [32] Mask Propagation for Efficient Video Semantic Segmentation
    Weng, Yuetian
    Han, Mingfei
    He, Haoyu
    Li, Mingjie
    Yao, Lina
    Chang, Xiaojun
    Zhuang, Bohan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [33] Semantic Video Segmentation : Exploring Inference Efficiency
    Tripathi, Subarna
    Belongie, Serge
    Hwang, Youngbae
    Truong Nguyen
    2015 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2015, : 157 - 158
  • [34] An integrated correlation measure for semantic video segmentation
    Lu, XY
    Ma, YF
    Zhang, HJ
    Wu, LD
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : 57 - 60
  • [35] An Attention based Method for Video Semantic Segmentation
    Huang, Yuan
    Huang, Qian
    Huang, Shuai
    Li, Yanping
    TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2020), 2020, 11519
  • [36] Temporal information integration for video semantic segmentation
    Guarino, G.
    Chateau, T.
    Teuliere, C.
    Antoine, V
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 8545 - 8551
  • [37] Discriminative Feature Learning for Video Semantic Segmentation
    Zhang, Han
    Jiang, Kai
    Zhang, Yu
    Li, Qing
    Xia, Changqun
    Chen, Xiaowu
    2014 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV2014), 2014, : 321 - 326
  • [38] Semiautomatic segmentation and tracking of semantic video objects
    Gu, C
    Lee, MC
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1998, 8 (05) : 572 - 584
  • [39] Video Summarization by Learning Deep Side Semantic Embedding
    Yuan, Yitian
    Mei, Tao
    Cui, Peng
    Zhu, Wenwu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (01) : 226 - 237
  • [40] Special Video Recognition Based on Semantic Embedding Learning
    Wu X.-Y.
    Pu Y.-J.
    Wang S.-J.
    Liu Z.-H.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (11): : 3225 - 3237