Global and Compact Video Context Embedding for Video Semantic Segmentation

被引:0
|
作者
Sun, Lei [1 ,2 ]
Liu, Yun [3 ]
Sun, Guolei [2 ]
Wu, Min [3 ]
Xu, Zhijie [4 ]
Wang, Kaiwei [1 ]
Van Gool, Luc [2 ]
机构
[1] Zhejiang Univ, Natl Res Ctr Opt Instrumentat, Hangzhou 310027, Peoples R China
[2] Swiss Fed Inst Technol, Comp Vis Lab, CH-8092 Zurich, Switzerland
[3] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[4] Univ Huddersfield, Ctr Visual & Immers Comp, Huddersfield HD1 3DH, England
来源
IEEE ACCESS | 2024年 / 12卷
基金
中国国家自然科学基金;
关键词
Semantic segmentation; Context modeling; Feature extraction; Computational modeling; Sun; Optical flow; Shape; Video semantic segmentation; global video context; compact video context; video context embedding; NETWORK;
D O I
10.1109/ACCESS.2024.3409150
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Intuitively, global video context could benefit video semantic segmentation (VSS) if it is designed to simultaneously model global temporal and spatial dependencies for a holistic understanding of the semantic scenes in a video clip. However, we found that the existing VSS approaches focus only on modeling local video context. This paper attempts to bridge this gap by learning global video context for VSS. Apart from the global nature, the video context should also be compact when considering the large number of video feature tokens and the redundancy among nearby video frames. Then, we embed the learned global and compact video context into the features of the target video frame to improve the distinguishability. The proposed VSS method is dubbed Global and Compact Video Context Embedding (GCVCE). With the compact nature, the number of global context tokens is very limited so that GCVCE is flexible and efficient for VSS. Since it may be too challenging to directly abstract a large number of video feature tokens into a small number of global context tokens, we further design a Cascaded Convolutional Downsampling (CCD) module before GCVCE to help it work better. 1.6% improvement in mIoU on the popular VSPW dataset compared to previous state-of-the-art methods demonstrate the effectiveness and efficiency of GCVCE and CCD for VSS. Code and models will be made publicly available.
引用
收藏
页码:135589 / 135600
页数:12
相关论文
共 50 条
  • [41] Semantic color extraction and semantic shot segmentation for soccer video
    Niu Z.-X.
    Li J.
    Gao X.-B.
    Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2010, 37 (04): : 613 - 618
  • [42] Going Deeper into Embedding Learning for Video Object Segmentation
    Yang, Zongxin
    Li, Peike
    Feng, Qianyu
    Wei, Yunchao
    Yang, Yi
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 697 - 700
  • [43] Instance Embedding Transfer to Unsupervised Video Object Segmentation
    Li, Siyang
    Seybold, Bryan
    Vorobyov, Alexey
    Fathi, Alireza
    Huang, Qin
    Kuo, C. -C. Jay
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6526 - 6535
  • [44] Video Contrastive Learning with Global Context
    Kuang, Haofei
    Zhu, Yi
    Zhang, Zhi
    Li, Xinyu
    Tighe, Joseph
    Schwertfeger, Soeren
    Stachniss, Cyrill
    Li, Mu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3188 - 3197
  • [45] Fast Context Adaptation for Video Object Segmentation
    Dubuisson, Isidore
    Muselet, Damien
    Ducottet, Christophe
    Lang, Jochen
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2023, PT I, 2023, 14184 : 273 - 283
  • [46] Segmentation and tracking of video objects for a content-based video indexing context
    Mazière, M
    Chassaing, F
    Garrido, L
    Salembier, P
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1191 - 1194
  • [47] Global semantic enhancement network for video captioning
    Luo, Xuemei
    Luo, Xiaotong
    Wang, Di
    Liu, Jinhui
    Wan, Bo
    Zhao, Lin
    PATTERN RECOGNITION, 2024, 145
  • [48] Local-Global Context Aware Transformer for Language-Guided Video Segmentation
    Liang, Chen
    Wang, Wenguan
    Zhou, Tianfei
    Miao, Jiaxu
    Luo, Yawei
    Yang, Yi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10055 - 10069
  • [49] Electrophoretic video display based on image semantic segmentation
    Zhang, Heng
    Li, Shi-Xiao
    Chen, Jian-Wen
    Wang, Zi-Yang
    Bo, Xiao
    Gao, Rui-Si
    Bai, Peng-Fei
    Zhou, Guo-Fu
    JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2025, 33 (01) : 34 - 45
  • [50] Semantic Single Video Segmentation with Robust Graph Representation
    Zhao, Handong
    Fu, Yun
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 2219 - 2225