Global and Compact Video Context Embedding for Video Semantic Segmentation

被引：0

作者：

Sun, Lei ^{[1
,2
]}

Liu, Yun ^{[3
]}

Sun, Guolei ^{[2
]}

Wu, Min ^{[3
]}

Xu, Zhijie ^{[4
]}

Wang, Kaiwei ^{[1
]}

Van Gool, Luc ^{[2
]}

机构：

[1] Zhejiang Univ, Natl Res Ctr Opt Instrumentat, Hangzhou 310027, Peoples R China

[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland

[3] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore

[4] Univ Huddersfield, Ctr Visual & Immers Comp, Huddersfield HD1 3DH, England

来源：

IEEE ACCESS | 2024年 / 12卷

基金：

中国国家自然科学基金;

关键词：

Semantic segmentation; Context modeling; Feature extraction; Computational modeling; Sun; Optical flow; Shape; Video semantic segmentation; global video context; compact video context; video context embedding; NETWORK;

D O I：

10.1109/ACCESS.2024.3409150

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE). With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.

引用

页码：135589 / 135600

页数：12

共 50 条

[1] A Continuous Semantic Embedding Method for Video Compact Representation
Han, Tingting
Qi, Yuankai
Zhu, Suguo
ELECTRONICS, 2021, 10 (24)
[2] Video Object Segmentation Using Global and Instance Embedding Learning
Ge, Wenbin
Lu, Xiankai
Shen, Jianbing
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16831 - 16840
[3] CONTEXT PROPAGATION FROM PROPOSALS FOR SEMANTIC VIDEO OBJECT SEGMENTATION
Wang, Tinghuai
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 256 - 260
[4] Learning Local and Global Temporal Contexts for Video Semantic Segmentation
Sun, Guolei
Liu, Yun
Ding, Henghui
Wu, Min
Van Gool, Luc
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (10) : 6919 - 6934
[5] Spatiotemporal Semantic Video Segmentation
Galmar, E.
Athanasiadis, Th
Huet, B.
Avrithis, Y.
2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 578 - +
[6] Multi-Granularity Context Network for Efficient Video Semantic Segmentation
Liang, Zhiyuan
Dai, Xiangdong
Wu, Yiqian
Jin, Xiaogang
Shen, Jianbing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3163 - 3175
[7] Clockwork Convnets for Video Semantic Segmentation
Shelhamer, Evan
Rakelly, Kate
Hoffman, Judy
Darrell, Trevor
COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 : 852 - 868
[8] Deep Video Dehazing With Semantic Segmentation
Ren, Wenqi
Zhang, Jingang
Xu, Xiangyu
Ma, Lin
Cao, Xiaochun
Meng, Gaofeng
Liu, Wei
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (04) : 1895 - 1908
[9] A pothole video dataset for semantic segmentation
Ihsan, Muhammad
Amrizal, Muhammad Alfian
Harjoko, Agus
DATA IN BRIEF, 2024, 53
[10] Semantic segmentation and description for video transcoding
Cavallaro, A
Steiger, O
Ebrahimi, T
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 597 - 600

← 1 2 3 4 5 →