SwinVI:3D Swin Transformer Model with U-net for Video Inpainting

被引：0

作者：

Zhang, Wei ^{[1
]}

Cao, Yang ^{[1
]}

Zhai, Junhai ^{[1
]}

机构：

[1] Hebei Univ, Coll Math & Informat Sci, Hebei Key Lab Machine Learning & Computat Intelli, Baoding, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

关键词：

Transformer; Video inpainting; Spatio-temporal;

D O I：

10.1109/IJCNN54540.2023.10192024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The goal of video inpainting is to fill in the local missingness of a given video as realistic as possible, it remains a challenging task, even with powerful deep learning methods. In recent years, Transformer has been introduced to video inpainting, and remarkable improvement has been achieved. However, it still suffers from the problems of generating blurry texture and requiring high computational cost. To address the two problems, we propose a new 3D Swin Transformer model (SwinVI) with U-net to improve the quality of video inpainting efficiently. We modify the vanilla Swin Transformer by extending the standard self-attention mechanism to a 3D self-attention mechanism, which enables the modified model to process spatio-temporal information simultaneously. SwinVI consists of U-net implemented by 3D Patch Merge and CNN-equipped upsampling module, which provides an end-to-end learning framework. This structural design empowers SwinVI to fully focus on background textures and moving objects to learn robust and more representative token vectors. Accordingly, to significantly improve the quality of video inpainting efficiently. We experimentally compare SwinVI with multiple methods on two challenging benchmarks. Experimental results demonstrate that the proposed SwinVI outperforms the state-of-the-art methods in RMSE, SSIM, and PSNR.

引用

页数：8

共 50 条

[31] A 3D attention U-Net network and its application in geological model parameterization
Li, Xiaobo
Li, Xin
Yan, Lin
Zhou, Tenghua
Li, Shunming
Wang, Jiqiang
Li, Xinhao
PETROLEUM EXPLORATION AND DEVELOPMENT, 2023, 50 (01) : 183 - 190
[32] A Multi Brain Tumor Region Segmentation Model Based on 3D U-Net
Li, Zhenwei
Wu, Xiaoqin
Yang, Xiaoli
APPLIED SCIENCES-BASEL, 2023, 13 (16):
[33] Comparison of tissue segmentation performance between 2D U-Net and 3D U-Net on brain MR Images
Woo, Boyeong
Lee, Myungeun
2021 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2021,
[34] Video Watermarking Method Based on 3D U-Net Robust Against Re-Shooting
Tsuboyama, Takaharu
Takahashi, Ryota
Iwata, Motoi
Kise, Koichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2025, E108D (04) : 311 - 319
[35] 3D U2-Net: A 3D Universal U-Net for Multi-domain Medical Image Segmentation
Huang, Chao
Han, Hu
Yao, Qingsong
Zhu, Shankuan
Zhou, S. Kevin
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 291 - 299
[36] Hybrid U-Net and Swin-transformer network for limited-angle cardiac computed tomography
Xu, Yongshun
Han, Shuo
Wang, Dayang
Wang, Ge
Maltz, Jonathan S.
Yu, Hengyong
PHYSICS IN MEDICINE AND BIOLOGY, 2024, 69 (10):
[37] A U-Net Architecture for Inpainting Lightstage Normal Maps
Zuo, Hancheng
Tiddeman, Bernard
COMPUTERS, 2024, 13 (02)
[38] 3DMAU-Net: liver segmentation network based on 3D U-Net
Dong Zhu
Tianyi Ma
Mengzhu Yang
Guoqiang Li
Shunbo Hu
Yongfang Wang
Optoelectronics Letters, 2025, 21 (6) : 370 - 377
[39] ESTUGAN: Enhanced Swin Transformer with U-Net Discriminator for Remote Sensing Image Super-Resolution
Yu, Chunhe
Hong, Lingyue
Pan, Tianpeng
Li, Yufeng
Li, Tingting
ELECTRONICS, 2023, 12 (20)
[40] MSU-Net: Multiscale Statistical U-Net for Real-Time 3D Cardiac MRI Video Segmentation
Wang, Tianchen
Xiong, Jinjun
Xu, Xiaowei
Jiang, Meng
Yuan, Haiyun
Huang, Meiping
Zhuang, Jian
Shi, Yiyu
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2019, PT II, 2019, 11765 : 614 - 622

← 1 2 3 4 5 →