Feature pre-inpainting enhanced transformer for video inpainting

被引：6

作者：

Li, Guanxiao ^{[1
]}

Zhang, Ke ^{[1
]}

Su, Yu ^{[1
]}

Wang, Jingyu ^{[1
,2
]}

机构：

[1] Northwestern Polytech Univ, Sch Astronaut, Xian 710072, Shaanxi, Peoples R China

[2] Northwestern Polytech Univ, Sch Artificial Intelligence OPt & Elect iOPEN, Xian 710072, Shaanxi, Peoples R China

来源：

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE | 2023年 / 123卷

基金：

中国国家自然科学基金;

关键词：

Video inpainting; Feature pre-inpainting; Local-global interleaving transformer;

D O I：

10.1016/j.engappai.2023.106323

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Transformer-based video inpainting methods aggregate coherent contents into missing regions by learning dependencies spatial-temporally. However, existing methods suffer from the inaccurate self-attention calcu-lation and excessive quadratic computational complexity, due to uninformative representations of missing regions and inefficient global self-attention mechanisms, respectively. To mitigate these problems, we propose a Feature pre-Inpainting enhanced Transformer (FITer) video inpainting method, in which the feature pre-inpainting network (FPNet) and local-global interleaving Transformer are designed. The FPNet pre-inpaints missing features before the Transformer by exploiting spatial context, and the representations of missing regions are thus enhanced with more informative content. Therefore, the interleaving Transformer can calculate more accurate self-attention weights and learns more effective dependencies between missing and valid regions. Since the interleaving Transformer involves both global and window-based local self-attention mechanisms, the proposed FITer method can effectively aggregate spatial-temporal features into missing regions while improving efficiency. Experiments on YouTube-VOS and DAVIS datasets demonstrate that the FITer method outperforms previous methods qualitatively and quantitatively.

引用

页数：12

共 50 条

[1] Improving Text-Guided Object Inpainting with Semantic Pre-inpainting
Chen, Yifu
Chen, Jingwen
Pan, Yingwei
Li, Yehao
Yao, Ting
Chen, Zhineng
Mei, Tao
COMPUTER VISION-ECCV 2024, PT XLVI, 2025, 15104 : 110 - 126
[2] ProPainter: Improving Propagation and Transformer for Video Inpainting
Zhou, Shangchen
Li, Chongyi
Chan, Kelvin C. K.
Loy, Chen Change
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 10443 - 10452
[3] DLFormer: Discrete Latent Transformer for Video Inpainting
Ren, Jingjing
Zheng, Qingqing
Zhao, Yuanyuan
Xu, Xuemiao
Li, Chen
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3501 - 3510
[4] Flow-Guided Transformer for Video Inpainting
Zhang, Kaidong
Fu, Jingjing
Liu, Dong
COMPUTER VISION - ECCV 2022, PT XVIII, 2022, 13678 : 74 - 90
[5] DLFormer: Discrete Latent Transformer for Video Inpainting
School of Computer Science and Engineering, South China University of Technology, China
不详
不详
Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, 1600, (3501-3510):
[6] Aggregating multi-scale flow-enhanced information in transformer for video inpainting
Li, Guanxiao
Zhang, Ke
Su, Yu
Wang, Jingyu
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[7] Video Wire Inpainting via Hierarchical Feature Mixture
Ji, Zhong
Su, Yimu
Zhang, Yan
Yang, Shuangming
Pang, Yanwei
IMAGE AND VISION COMPUTING, 2025, 157
[8] Progressive Temporal Feature Alignment Network for Video Inpainting
Zou, Xueyan
Yang, Linjie
Liu, Ding
Lee, Yong Jae
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16443 - 16452
[9] Semantic-Aware Dynamic Parameter for Video Inpainting Transformer
Lee, Eunhye
Yoo, Jinsu
Yang, Yunjeong
Baik, Sungyong
Kim, Tae Hyun
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12903 - 12912
[10] WTVI: A Wavelet-Based Transformer Network for Video Inpainting
Zhang, Ke
Li, Guanxiao
Su, Yu
Wang, Jingyu
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 616 - 620

← 1 2 3 4 5 →