Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring

被引:2
|
作者
Zhang, Huicong [1 ]
Xie, Haozhe [2 ]
Yao, Hongxun [1 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Nanyang Technol Univ, S Lab, Singapore, Singapore
来源
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024 | 2024年
关键词
D O I
10.1109/CVPR52733.2024.00258
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video deblurring relies on leveraging information from other frames in the video sequence to restore the blurred regions in the current frame. Mainstream approaches employ bidirectional feature propagation, spatio-temporal transformers, or a combination of both to extract information from the video sequence. However, limitations in memory and computational resources constraints the temporal window length of the spatio-temporal transformer, preventing the extraction of longer temporal contextual information from the video sequence. Additionally, bidirectional feature propagation is highly sensitive to inaccurate optical flow in blurry frames, leading to error accumulation during the propagation process. To address these issues, we propose BSSTNet, Blur-aware Spatio-temporal Sparse Transformer Network. It introduces the blur map, which converts the originally dense attention into a sparse form, enabling a more extensive utilization of information throughout the entire video sequence. Specifically, BSSTNet (1) uses a longer temporal window in the transformer, leveraging information from more distant frames to restore the blurry pixels in the current frame. (2) introduces bidirectional feature propagation guided by blur maps, which reduces error accumulation caused by the blur frame. The experimental results demonstrate the proposed BSSTNet outperforms the state-of-the-art methods on the GoPro and DVD datasets.
引用
收藏
页码:2673 / 2681
页数:9
相关论文
共 50 条
  • [21] Spatio-Temporal Graph Convolution Transformer for Video Question Answering
    Tang, Jiahao
    Hu, Jianguo
    Huang, Wenjun
    Shen, Shengzhi
    Pan, Jiakai
    Wang, Deming
    Ding, Yanyu
    IEEE ACCESS, 2024, 12 : 131664 - 131680
  • [22] Patch-Based Spatio-Temporal Deformable Attention BiRNN for Video Deblurring
    Zhang, Huicong
    Xie, Haozhe
    Zhang, Shengping
    Yao, Hongxun
    IEEE Transactions on Circuits and Systems for Video Technology,
  • [23] Efficient Spatio-Temporal Feature Extraction Recurrent Neural Network for Video Deblurring
    Pu Z.
    Ma W.
    Mi Q.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (11): : 1720 - 1730
  • [24] Cross-scale hierarchical spatio-temporal transformer for video enhancement
    Jiang, Qin
    Wang, Qinglin
    Chi, Lihua
    Liu, Jie
    KNOWLEDGE-BASED SYSTEMS, 2025, 309
  • [25] Video saliency detection by spatio-temporal sampling and sparse matrix decomposition
    Pan, Yunfeng
    Jiang, Qiuping
    Li, Zhutuan
    Shao, Feng
    WSEAS Transactions on Computers, 2014, 13 : 520 - 527
  • [26] Scalable spatio-temporal video indexing using sparse multiscale patches
    Piro, Paolo
    Anthoine, Sandrine
    Debreuve, Eric
    Barlaud, Michel
    CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 95 - 100
  • [27] Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling
    Fan, Hehe
    Yang, Yi
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2181 - 2192
  • [28] Flow-Guided Sparse Transformer for Video Deblurring
    Lin, Jing
    Cai, Yuanhao
    Hu, Xiaowan
    Wang, Haoqian
    Yan, Youliang
    Zou, Xueyi
    Ding, Henghui
    Zhang, Yulun
    Timofte, Radu
    Van Gool, Luc
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] DANet: A spatio-temporal dynamics and Detail Aware Network for video prediction
    Huang, Huilin
    Guan, YePeng
    NEUROCOMPUTING, 2024, 598
  • [30] Video Captioning With Object-Aware Spatio-Temporal Correlation and Aggregation
    Zhang, Junchao
    Peng, Yuxin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 (29) : 6209 - 6222