ACCLVOS: Atrous Convolution with Spatial-Temporal ConvLSTM for Video Object Segmentation

被引:0
|
作者
Xu, Muzhou [1 ]
Zhong, Shan [2 ]
Liu, Chunping [1 ]
Gong, Shengrong [2 ]
Wang, Zhaohui [1 ]
Xia, Yu [2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Changshu Inst Technol, Sch Comp Sci & Engn, Suzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Object segmentation; Semi-supervised learning; Convolution;
D O I
10.1109/ICPR48806.2021.9412128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised video object segmentation aims at segmenting the target of interest throughout a video sequence when only the annotated mask of the first frame is given. A feasible method for segmentation is to capture the spatial-temporal coherence between frames. However, it may suffer from mask drift when the spatial-temporal coherence is unreliable. To relieve this problem, we propose an encoder-decoder-recurrent model for semi-supervised video object segmentation. The model adopts a U-shape architecture that combines atrous convolution and ConvLSTM to establish the coherence in both the spatial and temporal domains. Furthermore, the weight ratio for each block is also reconstructed to make the model more suitable for the VOS task. We evaluate our method on two benchmarks, DAVIS-2017 and Youtube-VOS, where state-of-the-art segmentation accuracy with a real-time inference speed of 21.3 frames per second on a Tesla P100 is obtained.
引用
收藏
页码:2089 / 2096
页数:8
相关论文
共 50 条
  • [1] Comparative histogram: A spatial-temporal segmentation algorithm for video object segmentation
    Su, DW
    Zhou, LL
    Wang, JF
    [J]. Soft Computing as Transdisciplinary Science and Technology, 2005, : 142 - 152
  • [2] Language-Bridged Spatial-Temporal Interaction for Referring Video Object Segmentation
    Ding, Zihan
    Hui, Tianrui
    Huang, Junshi
    Wei, Xiaoming
    Han, Jizhong
    Liu, Si
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4954 - 4963
  • [3] Multi-scale Spatial-Temporal Feature Aggregating for Video Salient Object Segmentation
    Mu, Changhong
    Yuan, Zebin
    Ouyang, Xiuqin
    Wang, Bo
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2019), 2019, : 224 - 229
  • [4] Video Object Detection with an Aligned Spatial-Temporal Memory
    Xiao, Fanyi
    Lee, Yong Jae
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 494 - 510
  • [5] A video segmentation algorithm based on spatial-temporal information
    Zhu, H
    Li, ZM
    [J]. 2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 566 - 569
  • [6] Spatial-temporal joint probability images for video segmentation
    Li, ZN
    Zhong, X
    Drew, MS
    [J]. PATTERN RECOGNITION, 2002, 35 (09) : 1847 - 1867
  • [7] Spatial-temporal segmentation scheme for object-oriented video coding based on wavelet and MMRF
    Zheng, L
    Chan, AK
    Liu, JC
    [J]. WAVELET APPLICATIONS IN SIGNAL AND IMAGE PROCESSING VII, 1999, 3813 : 822 - 831
  • [8] ATROUS TEMPORAL CONVOLUTIONAL NETWORK FOR VIDEO ACTION SEGMENTATION
    Wang, Jiahao
    Du, Zhengyin
    Li, Annan
    Wang, Yunhong
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1585 - 1589
  • [9] Video foreground segmentation based on analysis of spatial-temporal information
    Min, Hua-Qing
    Chen, Cong
    Luo, Rong-Hua
    Zhu, Jin-Hui
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (04): : 582 - 590
  • [10] SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION
    Chen, Zhu
    Li, Weihai
    Fei, Chi
    Liu, Bin
    Yu, Nenghai
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1858 - 1862