Unified Spatio-Temporal Dynamic Routing for Efficient Video Object Segmentation

被引:0
|
作者
Dang, Jisheng [1 ,2 ,3 ]
Zheng, Huicheng [1 ,2 ,3 ]
Xu, Xiaohao [4 ]
Guo, Yulan [5 ,6 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[3] Guangdong Prov Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China
[4] Univ Michigan, Robot Dept, Ann Arbor, MI 48104 USA
[5] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518000, Peoples R China
[6] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object segmentation; spatio-temporal dynamic routing; progressive contextual memory enhancement; spatial constraint; temporal consistency;
D O I
10.1109/TITS.2023.3341457
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Existing methods for video object segmentation (VOS) have achieved significant success by performing semantic guidance, spatial constraint, or temporal consistency. However, VOS still remains highly challenging because it is difficult to collaboratively leverage spatial constraint, temporal consistency, and semantic guidance while reducing redundant information. In this paper, we propose an efficient unified spatio-temporal dynamic routing (STDR) framework to address VOS by achieving a better spatio-temporal balance while avoiding redundancy. Specifically, our unified spatio-temporal modeling contains three paths: 1) short-term spatial path is employed to mine the spatial constraints from the previous frame; 2) long-term semantic path is used to capture semantic cues from the first reference frame with ground-truth labels; 3) memory queue path is designed to efficiently exploit the temporal consistency of middle frames with a compact memory bank of constant size. To enhance the input of each path, we introduce a progressive contextual memory enhancement module to exploit the contextualized memory with growing receptive fields by progressively aggregating spatial contextual information from adjacent frames for each memory frame. Furthermore, we design a dynamic memory-routed module to globally refine the outputs of our three paths for unified modeling. Enhanced by the proposed modules, our STDR achieves state-of-the-art performance with fast speed on the DAVIS 2016, DAVIS 2017 Val/Test, YouTube-VOS 2018/2019, and real-world long-video benchmarks.
引用
收藏
页码:4512 / 4526
页数:15
相关论文
共 50 条
  • [1] Efficient probabilistic spatio-temporal video object segmentation
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    [J]. 6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, : 807 - +
  • [2] A Novel Spatio-Temporal Video Object Segmentation Algorithm
    Zhu, Shiping
    Xia, Xi
    Zhang, Qingrong
    Belloulata, Kamel
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-5, 2008, : 1916 - +
  • [3] A spatio-temporal video analysis system for object segmentation
    Xia, JH
    Wang, YL
    [J]. ISPA 2003: PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS, PTS 1 AND 2, 2003, : 812 - 815
  • [4] Dynamic Multiple Object Segmentation with Spatio-Temporal Filtering
    Yang, Wenguang
    Ren, Kan
    Wan, Minjie
    Kong, Xiaofang
    Qian, Weixian
    [J]. SENSORS, 2024, 24 (07)
  • [5] Video object segmentation using spatio-temporal deep network
    Ramaswamy, Akshaya
    Gubbi, Jayavardhana
    Balamuralidhar, P.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Deep Spatio-Temporal Random Fields for Efficient Video Segmentation
    Chandra, Siddhartha
    Couprie, Camille
    Kokkinos, Iasonas
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8915 - 8924
  • [7] Probabilistic spatio-temporal video object segmentation incorporating shape information
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 1893 - 1896
  • [8] A New Spatio-Temporal Saliency-Based Video Object Segmentation
    Zhengzheng Tu
    Andrew Abel
    Lei Zhang
    Bin Luo
    Amir Hussain
    [J]. Cognitive Computation, 2016, 8 : 629 - 647
  • [9] Spatio-temporal compression for semi-supervised video object segmentation
    Ji, Chuanjun
    Chen, Yadang
    Yang, Zhi-Xin
    Wu, Enhua
    [J]. VISUAL COMPUTER, 2023, 39 (10): : 4929 - 4942
  • [10] Coherency Based Spatio-Temporal Saliency Detection for Video Object Segmentation
    Mahapatra, Dwarikanath
    Gilani, Syed Omer
    Saini, Mukesh Kumar
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (03) : 454 - 462