TempFormer: Temporally Consistent Transformer for Video Denoising

被引：8

作者：

Song, Mingyang ^{[1
,2
]}

Zhang, Yang ^{[2
]}

Aydin, Tunc O. ^{[2
]}

机构：

[1] Swiss Fed Inst Technol, Zurich, Switzerland

[2] DisneyRes Studios, Zurich, Switzerland

来源：

COMPUTER VISION, ECCV 2022, PT XIX | 2022年 / 13679卷

关键词：

Video denoising; Transformer; Temporal consistency;

D O I：

10.1007/978-3-031-19800-7_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video denoising is a low-level vision task that aims to restore high quality videos from noisy content. Vision Transformer (ViT) is a new machine learning architecture that has shown promising performance on both high-level and low-level image tasks. In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. Specifically, we propose an efficient hybrid Transformer-based model, TempFormer, which composes Spatio-Temporal Transformer Blocks (STTB) and 3D convolutional layers. The proposed STTB learns the temporal information between neighboring frames implicitly by utilizing the proposed Joint Spatio-Temporal Mixer module for attention calculation and feature aggregation in each ViT block. Moreover, existing methods suffer from temporal inconsistency artifacts that are problematic in practical cases and distracting to the viewers. We propose a sliding block strategy with recurrent architecture, and use a new loss term, Overlap Loss, to alleviate the flickering between adjacent frames. Our method produces state-of-the-art spatio-temporal denoising quality with significantly improved temporal coherency, and requires less computational resources to achieve comparable denoising quality with competing methods (Fig. 1).

引用

页码：481 / 496

页数：16

共 50 条

[41] Temporally consistent reconstruction from multiple video streams using enhanced belief propagation
Larsen, E. Scott
Mordohai, Philippos
Pollefeys, Marc
Fuchs, Henry
2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 1440 - 1447
[42] Spatio-Temporally Consistent Color and Structure Optimization for Multiview Video Color Correction
Lu, Shao-Ping
Ceulemans, Beerend
Munteanu, Adrian
Schelkens, Peter
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (05) : 577 - 590
[43] Temporally consistent video colorization with deep feature propagation and self-regularization learning
Liu, Yihao
Zhao, Hengyuan
Chan, Kelvin C. K.
Wang, Xintao
Loy, Chen Change
Qiao, Yu
Dong, Chao
COMPUTATIONAL VISUAL MEDIA, 2024, 10 (02) : 375 - 395
[44] Video-TransUNet: Temporally Blended Vision Transformer for CT VFSS Instance Segmentation
Zeng, Chengxi
Yang, Xinyu
Mirmehdi, Majid
Gambaruto, Alberto M.
Burghardt, Tilo
FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
[45] Temporally Consistent Horizon Lines
Kluger, Florian
Ackermann, Hanno
Yang, Michael Ying
Rosenhahn, Bodo
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 3161 - 3167
[46] Learning Temporally Consistent Rigidities
Franco, Jean-Sebastien
Boyer, Edmond
2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011, : 1241 - 1248
[47] AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally Consistent Video Semantic Segmentation
Zhang, Yizhe
Borse, Shubhankar
Cai, Hong
Porikli, Fatih
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2633 - 2642
[48] Temporally Consistent Tone Mapping of Images and Video Using Optimal K-means Clustering
Oskarsson, Magnus
JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2017, 57 (02) : 225 - 238
[49] Temporally Consistent Tone Mapping of Images and Video Using Optimal K-means Clustering
Magnus Oskarsson
Journal of Mathematical Imaging and Vision, 2017, 57 : 225 - 238
[50] Consistent Video Inpainting Using Axial Attention-Based Style Transformer
Junayed, Masum Shah
Islam, Md Baharul
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7494 - 7504

← 1 2 3 4 5 →