Unified Spatio-Temporal Dynamic Routing for Efficient Video Object Segmentation

被引:0
|
作者
Dang, Jisheng [1 ,2 ,3 ]
Zheng, Huicheng [1 ,2 ,3 ]
Xu, Xiaohao [4 ]
Guo, Yulan [5 ,6 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China
[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[3] Guangdong Prov Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China
[4] Univ Michigan, Robot Dept, Ann Arbor, MI 48104 USA
[5] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518000, Peoples R China
[6] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object segmentation; spatio-temporal dynamic routing; progressive contextual memory enhancement; spatial constraint; temporal consistency;
D O I
10.1109/TITS.2023.3341457
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Existing methods for video object segmentation (VOS) have achieved significant success by performing semantic guidance, spatial constraint, or temporal consistency. However, VOS still remains highly challenging because it is difficult to collaboratively leverage spatial constraint, temporal consistency, and semantic guidance while reducing redundant information. In this paper, we propose an efficient unified spatio-temporal dynamic routing (STDR) framework to address VOS by achieving a better spatio-temporal balance while avoiding redundancy. Specifically, our unified spatio-temporal modeling contains three paths: 1) short-term spatial path is employed to mine the spatial constraints from the previous frame; 2) long-term semantic path is used to capture semantic cues from the first reference frame with ground-truth labels; 3) memory queue path is designed to efficiently exploit the temporal consistency of middle frames with a compact memory bank of constant size. To enhance the input of each path, we introduce a progressive contextual memory enhancement module to exploit the contextualized memory with growing receptive fields by progressively aggregating spatial contextual information from adjacent frames for each memory frame. Furthermore, we design a dynamic memory-routed module to globally refine the outputs of our three paths for unified modeling. Enhanced by the proposed modules, our STDR achieves state-of-the-art performance with fast speed on the DAVIS 2016, DAVIS 2017 Val/Test, YouTube-VOS 2018/2019, and real-world long-video benchmarks.
引用
收藏
页码:4512 / 4526
页数:15
相关论文
共 50 条
  • [41] Unsupervised Video Object Segmentation Using Motion Saliency-Guided Spatio-Temporal Propagation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 813 - 830
  • [42] Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation
    Fan, Jiaqing
    Su, Tiankang
    Zhang, Kaihua
    Liu, Qingshan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3646 - 3655
  • [43] Spatio-temporal video search using the object based video representation
    Zhong, D
    Chang, SF
    [J]. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL I, 1997, : 21 - 24
  • [44] AUTOMATIC SEGMENTATION OF VIDEO OBJECT PLANES IN MPEG-4 BASED ON SPATIO-TEMPORAL INFORMATION
    Xia Jinxiang Huang Shunji(Dept of Electronic Engineering
    [J]. Journal of Electronics(China), 2004, (03) : 206 - 212
  • [45] Memory-based spatio-temporal real-time object segmentation for video surveillance
    Amer, A
    [J]. REAL-TIME IMAGING VII, 2003, 5012 : 10 - 21
  • [46] AUTOMATIC SEGMENTATION OF VIDEO OBJECT PLANES IN MPEG-4 BASED ON SPATIO-TEMPORAL INFORMATION
    Xia Jinxiang Huang ShunjiDept of Electronic Engineering UEST of China Chengdu
    [J]. JournalofElectronics, 2004, (03) - 212
  • [47] Dual temporal memory network with high-order spatio-temporal graph learning for video object segmentation
    Fan, Jiaqing
    Hu, Shenglong
    Wang, Long
    Zhang, Kaihua
    Liu, Bo
    [J]. IMAGE AND VISION COMPUTING, 2024, 150
  • [48] Spatio-temporal segmentation
    Swain, C
    Puri, A
    [J]. VISUAL COMMUNICATIONS AND IMAGE PROCESSING '99, PARTS 1-2, 1998, 3653 : 1233 - 1236
  • [49] An FPGA-based implementation of spatio-temporal object segmentation
    Ratnayake, Kumara
    Amer, Aishy
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP 2006, PROCEEDINGS, 2006, : 3265 - +
  • [50] Efficient segmentation of spatio-temporal data from simulations
    Fodor, IK
    Kamath, C
    [J]. IMAGE AND VIDEO COMMUNICATIONS AND PROCESSING 2003, PTS 1 AND 2, 2003, 5022 : 366 - 376