Attention-guided Temporally Coherent Video Object Matting

被引:13
|
作者
Zhang, Yunke [1 ]
Wang, Chi [1 ]
Cui, Miaomiao [2 ]
Ren, Peiran [2 ]
Xie, Xuansong [2 ]
Hua, Xian-Sheng [3 ]
Bao, Hujun [1 ]
Huang, Qixing [4 ]
Xu, Weiwei [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Damo Acad, Alibaba Grp, Hangzhou, Peoples R China
[4] Univ Texas Austin, Austin, TX 78712 USA
基金
国家重点研发计划;
关键词
datasets; neural networks; video matting; attention mechanism; INTERACTIVE IMAGE; SEGMENTATION;
D O I
10.1145/3474085.3475623
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel deep learning-based video object matting method that can achieve temporally coherent matting results. Its key component is an attention-based temporal aggregation module that maximizes image matting networks ' strength for video matting networks. This module computes temporal correlations for pixels adjacent to each other along the time axis in feature space, which is robust against motion noises. We also design a novel loss term to train the attention weights, which drastically boosts the video matting performance. Besides, we show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network with a sparse set of user-annotated keyframes. To facilitate video matting and trimap generation networks ' training, we construct a large-scale video matting dataset with 80 training and 28 validation foreground video clips with ground-truth alpha mattes. Experimental results show that our method can generate high-quality alpha mattes for various videos featuring appearance change, occlusion, and fast motion. Our code and dataset can be found at: https://github.com/yunkezhang/TCVOM
引用
收藏
页码:5128 / 5137
页数:10
相关论文
共 50 条
  • [21] Spatial attention-guided deformable fusion network for salient object detection
    Aiping Yang
    Yan Liu
    Simeng Cheng
    Jiale Cao
    Zhong Ji
    Yanwei Pang
    Multimedia Systems, 2023, 29 : 2563 - 2573
  • [22] SnipeDet: Attention-guided pyramidal prediction kernels for generic object detection
    Chen, Suting
    Cheng, Zehua
    Zhang, Liangchen
    Zheng, Yujie
    PATTERN RECOGNITION LETTERS, 2021, 152 : 302 - 310
  • [23] Attention-guided multi-granularity fusion model for video summarization
    Zhang, Yunzuo
    Liu, Yameng
    Wu, Cunyu
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [24] AOSVSSNet: Attention-Guided Optical Satellite Video Smoke Segmentation Network
    Wang, Taoyang
    Hong, Jianzhi
    Han, Yuqi
    Zhang, Guo
    Chen, Shili
    Dong, Tiancheng
    Yang, Yapeng
    Ruan, Hang
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 8552 - 8566
  • [25] Visual Attention Guided Video Object Segmentation
    Liang, Hao
    Tan, Yihua
    PROCEEDINGS OF THE 2019 14TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2019), 2019, : 345 - 349
  • [26] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
    Shuqin Chen
    Li Yang
    Yikang Hu
    Neural Processing Letters, 2023, 55 (8) : 11509 - 11526
  • [27] Video Captioning Based on Cascaded Attention-Guided Visual Feature Fusion
    Chen, Shuqin
    Yang, Li
    Hu, Yikang
    NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11509 - 11526
  • [28] Deep Attention-Guided Hashing
    Yang, Zhan
    Raymond, Osolo Ian
    Sun, Wuqing
    Long, Jun
    IEEE ACCESS, 2019, 7 : 11209 - 11221
  • [29] Attention-Guided Collaborative Counting
    Mo, Hong
    Ren, Wenqi
    Zhang, Xiong
    Yan, Feihu
    Zhou, Zhong
    Cao, Xiaochun
    Wu, Wei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 6306 - 6319
  • [30] TEMPORALLY CONSISTENT VIDEO MATTING BASED ON BILAYER SEGMENTATION
    Tang, Zhen
    Miao, Zhenjiang
    Wan, Yanli
    2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 370 - 375