Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation

被引:1
|
作者
Fan, Jiaqing [1 ]
Su, Tiankang [2 ]
Zhang, Kaihua [3 ]
Liu, Bo [4 ]
Liu, Qingshan [5 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Automat, Nanjing, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Minist Educ, Engn Res Ctr Digital Forens, Nanjing, Peoples R China
[4] Walmart Global Tech, Sunnyvale, CA USA
[5] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
关键词
Unsupervised video object segmentation; Gabor filtering; Video Transformer; Spatio-temporal information selection;
D O I
10.1145/3581783.3612017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial-temporal structural details of targets in video (e.g. varying edges, textures over time) are essential to accurate Unsupervised Video Object Segmentation (UVOS). The vanilla multi-head self-attention in the Transformer-based UVOS methods usually concentrates on learning the general low-frequency information (e.g. illumination, color), while neglecting the high-frequency texture details, leading to unsatisfying segmentation results. To address this issue, this paper presents a Temporally efficient Gabor Transformer (TGFormer) for UVOS. The TGFormer jointly models the spatial dependencies and temporal coherence intra- and inter-frames, which can fully capture the rich structural details for accurate UVOS. Concretely, we first propose an effective learnable Gabor filtering Transformer to mine the structural texture details of the object for accurate UVOS. Then, to adaptively store the redundant neighboring historical information, we present an efficient dynamic neighboring frame selection module to automatically choose the useful temporal information, which simultaneously relieves the blurry frame and reduces the computation burden. Finally, we make the UVOS model be a fully Transformer architecture, meanwhile aggregating the information from space, Gabor and time domains, yielding a strong representation with rich structure details. Extensive experiments on five mainstream UVOS benchmarks (DAVIS2016, FBMS, DAVSOD, ViSal, and MCL) demonstrate the superiority of the presented solution to sate-of-the-art methods.
引用
收藏
页码:3394 / 3402
页数:9
相关论文
共 50 条
  • [31] Learning Unsupervised Video Object Segmentation through Visual Attention
    Wang, Wenguan
    Song, Hongmei
    Zhao, Shuyang
    Shen, Jianbing
    Zhao, Sanyuan
    Hoi, Steven C. H.
    Ling, Haibin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3059 - 3069
  • [32] A neural network based scheme for unsupervised video object segmentation
    Doulamis, AD
    Doulamis, ND
    Kollias, SD
    1998 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL 2, 1998, : 632 - 636
  • [33] Unsupervised Video Object Segmentation via Prototype Memory Network
    Yonsei University, Korea, Republic of
    不详
    Proc. - IEEE Winter Conf. Appl. Comput. Vis., WACV, 1600, (5913-5923):
  • [34] Unsupervised Video Object Segmentation via Prototype Memory Network
    Lee, Minhyeok
    Cho, Suhwan
    Lee, Seunghoon
    Park, Chaewon
    Lee, Sangyoun
    arXiv, 2022,
  • [35] Unsupervised Video Object Segmentation via Prototype Memory Network
    Lee, Minhyeok
    Cho, Suhwan
    Lee, Seunghoon
    Park, Chaewon
    Lee, Sangyoun
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5913 - 5923
  • [36] Flow-Edge Guided Unsupervised Video Object Segmentation
    Zhou, Yifeng
    Xu, Xing
    Shen, Fumin
    Zhu, Xiaofeng
    Shen, Heng Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8116 - 8127
  • [37] Unsupervised Online Video Object Segmentation With Motion Property Understanding
    Zhuo, Tao
    Cheng, Zhiyong
    Zhang, Peng
    Wong, Yongkang
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 237 - 249
  • [38] Unsupervised video object segmentation using conditional random fields
    Bhatti, Asma Hamza
    Rahman, Anis Ur
    Butt, Asad Anwar
    SIGNAL IMAGE AND VIDEO PROCESSING, 2019, 13 (01) : 9 - 16
  • [39] Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
    Zhuge, Yunzhi
    Gu, Hongyu
    Zhang, Lu
    Qi, Jinqing
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [40] Tsanet: Temporal and Scale Alignment for Unsupervised Video Object Segmentation
    Lee, Seunghoon
    Cho, Suhwan
    Lee, Dogyoon
    Lee, Minhyeok
    Lee, Sangyoun
    Proceedings - International Conference on Image Processing, ICIP, 2023, : 1535 - 1539