Dual temporal memory network with high-order spatio-temporal graph learning for video object segmentation

被引:0
|
作者
Fan, Jiaqing [1 ]
Hu, Shenglong [2 ]
Wang, Long [3 ]
Zhang, Kaihua [2 ]
Liu, Bo [4 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Nanjing, Peoples R China
[3] Guodian Nanjing Automat Co Ltd, Nanjing, Peoples R China
[4] Walmart Global Tech, Sunnyvale, CA USA
关键词
Semi-supervised learning; Video object segmentation; Graph convolution networks; High-order graph learning; Direction-aware attention;
D O I
10.1016/j.imavis.2024.105208
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Typically, Video Object Segmentation (VOS) always has the semi-supervised setting in the testing phase. The VOS aims to track and segment one or several target objects in the following frames in the sequence, only given the ground-truth segmentation mask in the initial frame. A fundamental issue in VOS is how to best utilize the temporal information to improve the accuracy. To address the aforementioned issue, we provide an end-to-end framework that simultaneously extracts long-term and short-term historical sequential information to current frame as temporal memories. The integrated temporal architecture consists of a short-term and a long-term memory modules. Specifically, the short-term memory module leverages a high-order graph-based learning framework to simulate the fine-grained spatial-temporal interactions between local regions across neighboring frames in a video, thereby maintaining the spatio-temporal visual consistency on local regions. Meanwhile, to relieve the occlusion and drift issues, the long-term memory module employs a Simplified Gated Recurrent Unit (S-GRU) to model the long evolutions in a video. Furthermore, we design a novel direction-aware attention module to complementarily augment the object representation for more robust segmentation. Our experiments on three mainstream VOS benchmarks, containing DAVIS 2017, DAVIS 2016, and Youtube-VOS, demonstrate that our proposed solution provides a fair tradeoff performance between both speed and accuracy.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Video object segmentation using spatio-temporal deep network
    Ramaswamy, Akshaya
    Gubbi, Jayavardhana
    Balamuralidhar, P.
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [2] Spatio-Temporal Dual-Branch Network With Predictive Feature Learning for Satellite Video Object Segmentation
    Zhong, Yanfei
    Shu, Meng
    Liu, Zhenqi
    Lu, Xiaoyan
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [3] Spatio-temporal associative memory and a high-order correlation neuronal network
    Gao, Qian
    Liu, Zili
    [J]. Neural Networks, 1988, 1 (1 SUPPL)
  • [4] Dual Temporal Memory Network for Efficient Video Object Segmentation
    Zhang, Kaihua
    Wang, Long
    Liu, Dong
    Liu, Bo
    Liu, Qingshan
    Li, Zhu
    [J]. MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1515 - 1523
  • [5] A Novel Spatio-Temporal Video Object Segmentation Algorithm
    Zhu, Shiping
    Xia, Xi
    Zhang, Qingrong
    Belloulata, Kamel
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-5, 2008, : 1916 - +
  • [6] Efficient probabilistic spatio-temporal video object segmentation
    Ahmed, Rakib
    Karmakar, Gour C.
    Dooley, Laurence S.
    [J]. 6TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE, PROCEEDINGS, 2007, : 807 - +
  • [7] A spatio-temporal video analysis system for object segmentation
    Xia, JH
    Wang, YL
    [J]. ISPA 2003: PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON IMAGE AND SIGNAL PROCESSING AND ANALYSIS, PTS 1 AND 2, 2003, : 812 - 815
  • [8] Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation
    Fan, Jiaqing
    Su, Tiankang
    Zhang, Kaihua
    Liu, Qingshan
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3646 - 3655
  • [9] Spatio-temporal Attention Network for Video Instance Segmentation
    Liu, Xiaoyu
    Ren, Haibing
    Ye, Tingmeng
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 725 - 727
  • [10] Interactive object extraction using spatio-temporal video segmentation
    [J]. Okubo, Hidehiko, 1600, Inst. of Image Information and Television Engineers (68):