TempFormer: Temporally Consistent Transformer for Video Denoising

被引:8
|
作者
Song, Mingyang [1 ,2 ]
Zhang, Yang [2 ]
Aydin, Tunc O. [2 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] DisneyRes Studios, Zurich, Switzerland
来源
关键词
Video denoising; Transformer; Temporal consistency;
D O I
10.1007/978-3-031-19800-7_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video denoising is a low-level vision task that aims to restore high quality videos from noisy content. Vision Transformer (ViT) is a new machine learning architecture that has shown promising performance on both high-level and low-level image tasks. In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. Specifically, we propose an efficient hybrid Transformer-based model, TempFormer, which composes Spatio-Temporal Transformer Blocks (STTB) and 3D convolutional layers. The proposed STTB learns the temporal information between neighboring frames implicitly by utilizing the proposed Joint Spatio-Temporal Mixer module for attention calculation and feature aggregation in each ViT block. Moreover, existing methods suffer from temporal inconsistency artifacts that are problematic in practical cases and distracting to the viewers. We propose a sliding block strategy with recurrent architecture, and use a new loss term, Overlap Loss, to alleviate the flickering between adjacent frames. Our method produces state-of-the-art spatio-temporal denoising quality with significantly improved temporal coherency, and requires less computational resources to achieve comparable denoising quality with competing methods (Fig. 1).
引用
下载
收藏
页码:481 / 496
页数:16
相关论文
共 50 条
  • [31] Generation of Temporally Consistent Depth Maps Using Nosie Removal from Video
    Stankiewicz, Olgierd
    Wegner, Krzysztof
    COMPUTER VISION AND GRAPHICS, PT II, 2010, 6375 : 292 - 299
  • [32] ESTIMATION OF TEMPORALLY-CONSISTENT DEPTH MAPS FROM VIDEO WITH REDUCED NOISE
    Stankiewicz, Olgierd
    Domanski, Marek
    Wegner, Krzysztof
    2015 3DTV-CONFERENCE - TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO (3DTV-CON), 2015,
  • [33] Multi-class video segmentation based on temporally consistent energy model
    Bing, Liu
    Advances in Information Sciences and Service Sciences, 2012, 4 (01): : 85 - 92
  • [34] Online Temporally Consistent Indoor Depth Video Enhancement via Static Structure
    Sheng, Lu
    Ngan, King Ngi
    Lim, Chern-Loon
    Li, Songnan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (07) : 2197 - 2211
  • [35] Temporally Consistent Superpixels
    Reso, Matthias
    Jachalsky, Joern
    Rosenhahn, Bodo
    Ostermann, Joern
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 385 - 392
  • [36] A temporally-aware noise-informed invertible network for progressive video denoising
    Huang, Yan
    Luo, Huixin
    Xu, Yong
    Meng, Xian-Bing
    Image and Vision Computing, 2025, 154
  • [37] A Temporally-Aware Noise-Informed Invertible Network for Progressive Video Denoising
    South China University of Technology, China
    不详
  • [38] Temporally consistent video colorization with deep feature propagation and self-regularization learning
    Yihao Liu
    Hengyuan Zhao
    Kelvin C. K. Chan
    Xintao Wang
    Chen Change Loy
    Yu Qiao
    Chao Dong
    Computational Visual Media, 2024, 10 : 375 - 395
  • [39] Temporally Consistent Depth Map Estimation for 3D Video Generation and Coding
    Lee, Sang-Beom
    Ho, Yo-Sung
    CHINA COMMUNICATIONS, 2013, 10 (05) : 39 - 49
  • [40] Spatio-temporally Consistent Multi-view Video Synthesis for Autostereoscopic Displays
    Lin, Shu-Jyuan
    Cheng, Chia-Ming
    Lai, Shang-Hong
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 532 - 542