Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation

被引:3
|
作者
Fan, Jiaqing [1 ]
Su, Tiankang [2 ]
Zhang, Kaihua [3 ]
Liu, Qingshan [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Sch Automat, Nanjing, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Engn Res Ctr Digital Forens, Minist Educ, Nanjing, Peoples R China
关键词
Unsupervised video object segmentation; bidirectional feature propagation; deep learning; feature refinement; direction-aware graph attention;
D O I
10.1145/3503161.3548039
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Spatio-temporal feature representation is essential for accurate unsupervised video object segmentation, which needs an effective feature propagation paradigm for both appearance and motion features that can fully interchange information across frames. However, existing solutions mainly focus on the forward feature propagation from the preceding frame to the current one, either using the former segmentation mask or motion propagation in a frame-by-frame manner. This ignores the bi-directional temporal feature interactions (including the backward propagation from the future to the current frame) across all frames that can help to enhance the spatio-temporal feature representation for segmentation prediction. To this end, this paper presents a novel Dense Bidirectional Spatio-temporal feature propagation Network (DBSNet) to fully integrate the forward and the backward propagations across all frames. Specifically, a dense bi-ConvLSTM module is first developed to propagate the features across all frames in a forward and backward manner. This can fully capture the multi-level spatio-temporal contextual information across all frames, producing an effective feature representation that has a strong discriminative capability to tell from noisy backgrounds. Following it, a spatio-temporal Transformer refinement module is designed to further enhance the propagated features, which can effectively capture the spatio-temporal long-range dependencies among all frames. Afterwards, a Co-operative Direction-aware Graph Attention (Co-DGA) module is designed to integrate the propagated appearance-motion cues, yielding a strong spatio-temporal feature representation for segmentation mask prediction. The Co-DGA assigns proper attentional weights to neighboring points along the coordinate axis, making the segmentation model to selectively focus on the most relevant neighbors. Extensive evaluations on four mainstream challenging benchmarks including DAVIS16, FBMS, DAVSOD, and MCL demonstrate that the proposed DBSNet achieves favorable performance against state-of-the-art methods in terms of all evaluation metrics.
引用
收藏
页码:3646 / 3655
页数:10
相关论文
共 50 条
  • [1] Unsupervised Video Object Segmentation Using Motion Saliency-Guided Spatio-Temporal Propagation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    [J]. COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 813 - 830
  • [2] Spatio-Temporal Dual-Branch Network With Predictive Feature Learning for Satellite Video Object Segmentation
    Zhong, Yanfei
    Shu, Meng
    Liu, Zhenqi
    Lu, Xiaoyan
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] Video object segmentation using spatio-temporal deep network
    Ramaswamy, Akshaya
    Gubbi, Jayavardhana
    Balamuralidhar, P.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [4] Unsupervised Video Hashing by Exploiting Spatio-Temporal Feature
    Ma, Chao
    Gu, Yun
    Liu, Wei
    Yang, Jie
    He, Xiangjian
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2016, PT III, 2016, 9949 : 511 - 518
  • [5] Dual temporal memory network with high-order spatio-temporal graph learning for video object segmentation
    Fan, Jiaqing
    Hu, Shenglong
    Wang, Long
    Zhang, Kaihua
    Liu, Bo
    [J]. IMAGE AND VISION COMPUTING, 2024, 150
  • [6] Efficient probabilistic spatio-temporal video object segmentation
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    [J]. 6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, : 807 - +
  • [7] A Novel Spatio-Temporal Video Object Segmentation Algorithm
    Zhu, Shiping
    Xia, Xi
    Zhang, Qingrong
    Belloulata, Kamel
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-5, 2008, : 1916 - +
  • [8] A spatio-temporal video analysis system for object segmentation
    Xia, JH
    Wang, YL
    [J]. ISPA 2003: PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS, PTS 1 AND 2, 2003, : 812 - 815
  • [9] Interactive spatio-temporal feature learning network for video foreground detection
    Hongrui Zhang
    Huan Li
    [J]. Complex & Intelligent Systems, 2022, 8 : 4251 - 4263
  • [10] INCORPORATING SCALABILITY IN UNSUPERVISED SPATIO-TEMPORAL FEATURE LEARNING
    Paul, Sujoy
    Roy, Sourya
    Roy-Chowdhury, Amit K.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1503 - 1507