TempFormer: Temporally Consistent Transformer for Video Denoising

被引:8
|
作者
Song, Mingyang [1 ,2 ]
Zhang, Yang [2 ]
Aydin, Tunc O. [2 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] DisneyRes Studios, Zurich, Switzerland
来源
关键词
Video denoising; Transformer; Temporal consistency;
D O I
10.1007/978-3-031-19800-7_28
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video denoising is a low-level vision task that aims to restore high quality videos from noisy content. Vision Transformer (ViT) is a new machine learning architecture that has shown promising performance on both high-level and low-level image tasks. In this paper, we propose a modified ViT architecture for video processing tasks, introducing a new training strategy and loss function to enhance temporal consistency without compromising spatial quality. Specifically, we propose an efficient hybrid Transformer-based model, TempFormer, which composes Spatio-Temporal Transformer Blocks (STTB) and 3D convolutional layers. The proposed STTB learns the temporal information between neighboring frames implicitly by utilizing the proposed Joint Spatio-Temporal Mixer module for attention calculation and feature aggregation in each ViT block. Moreover, existing methods suffer from temporal inconsistency artifacts that are problematic in practical cases and distracting to the viewers. We propose a sliding block strategy with recurrent architecture, and use a new loss term, Overlap Loss, to alleviate the flickering between adjacent frames. Our method produces state-of-the-art spatio-temporal denoising quality with significantly improved temporal coherency, and requires less computational resources to achieve comparable denoising quality with competing methods (Fig. 1).
引用
下载
收藏
页码:481 / 496
页数:16
相关论文
共 50 条
  • [21] TEMPORALLY CONSISTENT KEY FRAME SELECTION FROM VIDEO FOR FACE RECOGNITION
    Saeed, Usman
    Dugelay, Jean-Luc
    18TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2010), 2010, : 1311 - 1315
  • [22] Temporally Consistent Motion Segmentation From RGB-D Video
    Bertholet, P.
    Ichim, A. E.
    Zwicker, M.
    COMPUTER GRAPHICS FORUM, 2018, 37 (06) : 118 - 134
  • [23] Video completion via spatio-temporally consistent motion inpainting
    1600, Information Processing Society of Japan (06):
  • [24] VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal
    Chang, Ya-Liang
    Liu, Zhe Yu
    Hsu, Winston
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1785 - 1794
  • [25] Temporally consistent depth video filter using temporal outlier reduction
    Sangbeom Lee
    Yo-Sung Ho
    Signal, Image and Video Processing, 2015, 9 : 1401 - 1408
  • [26] Spatio-temporally consistent video processing for local backlight dimming
    Muijs, Remo
    Langendijk, Erno
    Vossen, Frank
    2008 SID INTERNATIONAL SYMPOSIUM, DIGEST OF TECHNICAL PAPERS, VOL XXXIX, BOOKS I-III, 2008, 39 : 979 - 982
  • [27] Temporally consistent depth video filter using temporal outlier reduction
    Lee, Sangbeom
    Ho, Yo-Sung
    SIGNAL IMAGE AND VIDEO PROCESSING, 2015, 9 (06) : 1401 - 1408
  • [28] Region-based Temporally Consistent Video Post-processing
    Dong, Xuan
    Bonev, Boyan
    Zhu, Yu
    Yuille, Alan L.
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 714 - 722
  • [29] Hybrid Skeleton Driven Surface Registration for Temporally Consistent Volumetric Video
    Regateiro, Joao
    Volino, Marco
    Hilton, Adrian
    2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 514 - 522
  • [30] Video OWL-ViT: Temporally-consistent open-world localization in video
    Heigold, Georg
    Minderer, Matthias
    Gritsenko, Alexey
    Bewley, Alex
    Keysers, Daniel
    Lucic, Mario
    Yu, Fisher
    Kipf, Thomas
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13756 - 13765