Unsupervised Video Object Segmentation via Weak User Interaction and Temporal Modulation

被引:0
|
作者
FAN Jiaqing [1 ]
ZHANG Kaihua [2 ,3 ]
ZHAO Yaqian [4 ]
LIU Qingshan [2 ,3 ]
机构
[1] College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
[2] College of Computer and Software, Nanjing University of Information Science and Technology
[3] Engineering Research Center of Digital Forensics, Ministry of Education
[4] Inspur Suzhou Intelligent Technology Corporation
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP391.41 [];
学科分类号
080203 ;
摘要
In unsupervised video object segmentation(UVOS), the whole video might segment the wrong target due to the lack of initial prior information. Also, in semi-supervised video object segmentation(SVOS), the initial video frame with a fine-grained pixel-level mask is essential to good segmentation accuracy. It is expensive and laborious to provide the accurate pixel-level masks for each training sequence. To address this issue, We present a weak user interactive UVOS approach guided by a simple human-made rectangle annotation in the initial frame. We first interactively draw the region of interest by a rectangle, and then we leverage the mask RCNN(region-based convolutional neural networks) method to generate a set of coarse reference labels for subsequent mask propagations. To establish the temporal correspondence between the coherent frames, we further design two novel temporal modulation modules to enhance the target representations. We compute the earth mover’s distance(EMD)-based similarity between coherent frames to mine the co-occurrent objects in the two images, which is used to modulate the target representation to highlight the foreground target. We design a cross-squeeze temporal modulation module to emphasize the co-occurrent features across frames, which further helps to enhance the foreground target representation. We augment the temporally modulated representations with the original representation and obtain the compositive spatio-temporal information, producing a more accurate video object segmentation(VOS) model. The experimental results on both UVOS and SVOS datasets including Davis2016,FBMS, Youtube-VOS, and Davis2017, show that our method yields favorable accuracy and complexity. The related code is available.
引用
收藏
页码:507 / 518
页数:12
相关论文
共 50 条
  • [31] Mask Selection and Propagation for Unsupervised Video Object Segmentation
    Garg, Shubhika
    Goel, Vidit
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1679 - 1689
  • [32] VabCut: A Video Extension of GrabCut for Unsupervised Video Foreground Object Segmentation
    Poullot, Sebastien
    Satoh, Shin'Ichi
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 362 - 371
  • [33] Fast User-Guided Video Object Segmentation by Interaction-and-Propagation Networks
    Oh, Seoung Wug
    Lee, Joon-Young
    Xu, Ning
    Kim, Seon Joo
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5242 - 5251
  • [34] Video Segmentation via Object Flow
    Tsai, Yi-Hsuan
    Yang, Ming-Hsuan
    Black, Michael J.
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3899 - 3908
  • [35] Unsupervised video segmentation based on watersheds and temporal tracking
    Wang, DM
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1998, 8 (05) : 539 - 546
  • [36] Video Object Segmentation without Temporal Information
    Maninis, Kevis-Kokitsi
    Caelles, Sergi
    Chen, Yuhua
    Pont-Tuset, Jordi
    Leal-Taixe, Laura
    Cremers, Daniel
    Van Gool, Luc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (06) : 1515 - 1530
  • [37] Video Object Segmentation with Weakly Temporal Information
    Zhang, Yikun
    Yao, Rui
    Jiang, Qingnan
    Zhang, Changbin
    Wang, Shi
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2019, 13 (03): : 1434 - 1449
  • [38] Unsupervised pixel-level video foreground object segmentation via shortest path algorithm
    Cao, Xiaochun
    Wang, Feng
    Zhang, Bao
    Fu, Huazhu
    Li, Chao
    NEUROCOMPUTING, 2016, 172 : 235 - 243
  • [39] Unsupervised Video Object Segmentation Using Motion Saliency-Guided Spatio-Temporal Propagation
    Hu, Yuan-Ting
    Huang, Jia-Bin
    Schwing, Alexander G.
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 813 - 830
  • [40] Bidirectionally Learning Dense Spatio-temporal Feature Propagation Network for Unsupervised Video Object Segmentation
    Fan, Jiaqing
    Su, Tiankang
    Zhang, Kaihua
    Liu, Qingshan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3646 - 3655