Feature pre-inpainting enhanced transformer for video inpainting

被引:6
|
作者
Li, Guanxiao [1 ]
Zhang, Ke [1 ]
Su, Yu [1 ]
Wang, Jingyu [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Astronaut, Xian 710072, Shaanxi, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence OPt & Elect iOPEN, Xian 710072, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Video inpainting; Feature pre-inpainting; Local-global interleaving transformer;
D O I
10.1016/j.engappai.2023.106323
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transformer-based video inpainting methods aggregate coherent contents into missing regions by learning dependencies spatial-temporally. However, existing methods suffer from the inaccurate self-attention calcu-lation and excessive quadratic computational complexity, due to uninformative representations of missing regions and inefficient global self-attention mechanisms, respectively. To mitigate these problems, we propose a Feature pre-Inpainting enhanced Transformer (FITer) video inpainting method, in which the feature pre-inpainting network (FPNet) and local-global interleaving Transformer are designed. The FPNet pre-inpaints missing features before the Transformer by exploiting spatial context, and the representations of missing regions are thus enhanced with more informative content. Therefore, the interleaving Transformer can calculate more accurate self-attention weights and learns more effective dependencies between missing and valid regions. Since the interleaving Transformer involves both global and window-based local self-attention mechanisms, the proposed FITer method can effectively aggregate spatial-temporal features into missing regions while improving efficiency. Experiments on YouTube-VOS and DAVIS datasets demonstrate that the FITer method outperforms previous methods qualitatively and quantitatively.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Improving Text-Guided Object Inpainting with Semantic Pre-inpainting
    Chen, Yifu
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chen, Zhineng
    Mei, Tao
    COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 110 - 126
  • [2] ProPainter: Improving Propagation and Transformer for Video Inpainting
    Zhou, Shangchen
    Li, Chongyi
    Chan, Kelvin C. K.
    Loy, Chen Change
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10443 - 10452
  • [3] DLFormer: Discrete Latent Transformer for Video Inpainting
    Ren, Jingjing
    Zheng, Qingqing
    Zhao, Yuanyuan
    Xu, Xuemiao
    Li, Chen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3501 - 3510
  • [4] Flow-Guided Transformer for Video Inpainting
    Zhang, Kaidong
    Fu, Jingjing
    Liu, Dong
    COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 74 - 90
  • [5] DLFormer: Discrete Latent Transformer for Video Inpainting
    School of Computer Science and Engineering, South China University of Technology, China
    不详
    不详
    Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, 1600, (3501-3510):
  • [6] Aggregating multi-scale flow-enhanced information in transformer for video inpainting
    Li, Guanxiao
    Zhang, Ke
    Su, Yu
    Wang, Jingyu
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [7] Video Wire Inpainting via Hierarchical Feature Mixture
    Ji, Zhong
    Su, Yimu
    Zhang, Yan
    Yang, Shuangming
    Pang, Yanwei
    IMAGE AND VISION COMPUTING, 2025, 157
  • [8] Progressive Temporal Feature Alignment Network for Video Inpainting
    Zou, Xueyan
    Yang, Linjie
    Liu, Ding
    Lee, Yong Jae
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16443 - 16452
  • [9] Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
    Lee, Eunhye
    Yoo, Jinsu
    Yang, Yunjeong
    Baik, Sungyong
    Kim, Tae Hyun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12903 - 12912
  • [10] WTVI: A Wavelet-Based Transformer Network for Video Inpainting
    Zhang, Ke
    Li, Guanxiao
    Su, Yu
    Wang, Jingyu
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 616 - 620