Towards Robust Video Object Segmentation with Adaptive Object Calibration

被引:10
|
作者
Xu, Xiaohao [1 ,3 ]
Wang, Jinglu [2 ]
Ming, Xiang [2 ]
Lu, Yan [2 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
[3] MSRA, Beijing, Peoples R China
关键词
video object segmentation; robustness; neural network;
D O I
10.1145/3503161.3547824
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In the booming video era, video segmentation attracts increasing research attention in the multimedia community. Semi-supervised video object segmentation (VOS) aims at segmenting objects in all target frames of a video, given annotated object masks of reference frames. Most existing methods build pixel-wise reference-target correlations and then perform pixel-wise tracking to obtain target masks. Due to neglecting object-level cues, pixel-level approaches make the tracking vulnerable to perturbations, and even indiscriminate among similar objects. Towards robust VOS, the key insight is to calibrate the representation and mask of each specific object to be expressive and discriminative. Accordingly, we propose a new deep network, which can adaptively construct object representations and calibrate object masks to achieve stronger robustness. First, we construct the object representations by applying an adaptive object proxy (AOP) aggregation method, where the proxies represent arbitrary-shaped segments at multi-levels for reference. Then, prototype masks are initially generated from the reference-target correlations based on AOP. Afterwards, such proto-masks are further calibrated through network modulation, conditioning on the object proxy representations. We consolidate this conditional mask calibration process in a progressive manner, where the object representations and proto-masks evolve to be discriminative iteratively. Extensive experiments are conducted on the standard VOS benchmarks, YouTube-VOS-18/19 and DAVIS-17. Our model achieves the state-of-the-art performance among existing published works, and also exhibits superior robustness against perturbations.
引用
收藏
页码:2709 / 2718
页数:10
相关论文
共 50 条
  • [1] Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation
    Dang, Jisheng
    Zheng, Huicheng
    Xu, Xiaohao
    Wang, Longguang
    Hu, Qingyong
    Guo, Yulan
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14
  • [2] Flow Adaptive Video Object Segmentation
    Lin, Fanqing
    Chou, Yao
    Martinez, Tony
    [J]. IMAGE AND VISION COMPUTING, 2020, 94
  • [3] Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment
    Liang, Shuxian
    Shen, Xu
    Huang, Jianqiang
    Hua, Xian-Sheng
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8045 - 8054
  • [4] Towards Good Practices for Video Object Segmentation
    Yu, Dongdong
    Su, Kai
    Guo, Hengkai
    Wang, Jian
    Zhou, Kaihui
    Huang, Yuanyuan
    Dong, Minghui
    Shao, Jie
    Wang, Changhu
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 701 - 704
  • [5] Adaptive Memory Management for Video Object Segmentation
    Pourganjalikhan, Ali
    Poullis, Charalambos
    [J]. 2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 75 - 82
  • [6] Adaptive Online Learning for Video Object Segmentation
    Wei, Li
    Xu, Chunyan
    Zhang, Tong
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 : 22 - 34
  • [7] Adaptive background generation for video object segmentation
    Kim, Taekyung
    Paik, Joonki
    [J]. ADVANCES IN VISUAL COMPUTING, PT 1, 2006, 4291 : 871 - +
  • [8] Robust object segmentation using adaptive thresholding
    Huang, Xiaxi
    Boulgouris, Nikolaos V.
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-7, 2007, : 45 - 48
  • [9] Breaking the "Object" in Video Object Segmentation
    Tokmakov, Pavel
    Li, Jie
    Gaidon, Adrien
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22836 - 22845
  • [10] Robust and Efficient Memory Network for Video Object Segmentation
    Chen, Yadang
    Zhang, Dingwei
    Yang, Zhi-Xin
    Wu, Enhua
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1769 - 1774