Video Object Segmentation with Referring Expressions

被引:1
|
作者
Khoreva, Anna [1 ]
Rohrbach, Anna [2 ]
Schiele, Bernt [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
10.1007/978-3-030-11018-5_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first video frame. However, obtaining a detailed mask is expensive and time-consuming. In this work we explore a more practical and natural way of identifying a target object by employing language referring expressions. Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions. To evaluate our approach we augment the popular video object segmentation benchmarks, DAVIS(16) and DAVIS(17), with language descriptions of target objects. We show that our approach performs on par with the methods which have access to the object mask on DAVIS(16) and is competitive to methods using scribbles on challenging DAVIS(17).
引用
收藏
页码:7 / 12
页数:6
相关论文
共 50 条
  • [1] Video Object Segmentation with Language Referring Expressions
    Khoreva, Anna
    Rohrbach, Anna
    Schiele, Bernt
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 123 - 141
  • [2] A closer look at referring expressions for video object segmentation
    Miriam Bellver
    Carles Ventura
    Carina Silberer
    Ioannis Kazakos
    Jordi Torres
    Xavier Giro-i-Nieto
    [J]. Multimedia Tools and Applications, 2023, 82 : 4419 - 4438
  • [3] A closer look at referring expressions for video object segmentation
    Bellver, Miriam
    Ventura, Carles
    Silberer, Carina
    Kazakos, Ioannis
    Torres, Jordi
    Giro-i-Nieto, Xavier
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (03) : 4419 - 4438
  • [4] Language as Queries for Referring Video Object Segmentation
    Wu, Jiannan
    Jiang, Yi
    Sun, Peize
    Yuan, Zehuan
    Luo, Ping
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4964 - 4974
  • [5] Object-Agnostic Transformers for Video Referring Segmentation
    Yang, Xu
    Wang, Hao
    Xie, De
    Deng, Cheng
    Tao, Dacheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2839 - 2849
  • [6] Temporal Collection and Distribution for Referring Video Object Segmentation
    Tang, Jiajin
    Zheng, Ge
    Yang, Sibei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15420 - 15430
  • [7] Decoupling Multimodal Transformers for Referring Video Object Segmentation
    Gao, Mingqi
    Yang, Jinyu
    Han, Jungong
    Lu, Ke
    Zheng, Feng
    Montana, Giovanni
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4518 - 4528
  • [8] MRRVOS: Modular Refinement Referring Video Object Segmentation
    Duan, Zhijiang
    Sun, Yukuan
    Wang, Jianming
    [J]. WEB AND BIG DATA, 2021, 1505 : 117 - 128
  • [9] OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
    Wu, Dongming
    Wang, Tiancai
    Zhang, Yuang
    Zhang, Xiangyu
    Shen, Jianbing
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2749 - 2758
  • [10] Weakly supervised video object segmentation initialized with referring expression
    Bu, Xiaoqing
    Sun, Yukuan
    Wang, Jianming
    Liu, Kunliang
    Liang, Jiayu
    Jin, Guanghao
    Chung, Tae-Sun
    [J]. NEUROCOMPUTING, 2021, 453 : 754 - 765