Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引:28
|
作者
Wang, Lishun [1 ,2 ]
Cao, Miao [3 ,4 ]
Zhong, Yong [1 ,2 ]
Yuan, Xin [3 ,4 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China
[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China
关键词
Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;
D O I
10.1109/TPAMI.2022.3225382
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.
引用
收藏
页码:9072 / 9089
页数:18
相关论文
共 50 条
  • [1] Snapshot spatial-temporal compressive imaging
    Qiao, Mu
    Liu, Xuan
    Yuan, Xin
    OPTICS LETTERS, 2020, 45 (07) : 1659 - 1662
  • [2] Provable deep video denoiser using spatial-temporal information for video snapshot compressive imaging: Algorithm and convergence analysis
    Shi, Baoshun
    Li, Dan
    Wang, Yuxin
    Su, Yueming
    Lian, Qiusheng
    SIGNAL PROCESSING, 2024, 214
  • [3] Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
    Wang, Ping
    Zhang, Yulun
    Wang, Lishun
    Yuan, Xin
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 104 - 122
  • [4] Perceptual Spatial-temporal Video Compressive Sensing Network
    Liu, Wan
    Xie, Xuemei
    Zhao, Zhifu
    Shi, Guangming
    ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
  • [5] ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer
    Yang, Beiying
    Zhu, Guibo
    Ge, Guojing
    Luo, Jinzhao
    Wang, Jinqiao
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1895 - 1900
  • [6] Learning a spatial-temporal texture transformer network for video inpainting
    Ma, Pengsen
    Xue, Tao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [7] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
    Zhao, Cairong
    Wang, Chutian
    Hu, Guosheng
    Chen, Haonan
    Liu, Chun
    Tang, Jinhui
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
  • [8] Transformer-Based Cascading Reconstruction Network for Video Snapshot Compressive Imaging
    Wen, Jiaxuan
    Huang, Junru
    Chen, Xunhao
    Huang, Kaixuan
    Sun, Yubao
    APPLIED SCIENCES-BASEL, 2023, 13 (10):
  • [9] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
    Dao, Minh-Son
    Zetsu, Koji
    Hoang, Duy-Tang
    Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281
  • [10] Two-step spatial-temporal compressive sensing imaging
    Zhao, Dingaoyu
    Ke, Jun
    ADVANCED OPTICAL IMAGING TECHNOLOGIES IV, 2021, 11896