Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引:28
|
作者
Wang, Lishun [1 ,2 ]
Cao, Miao [3 ,4 ]
Zhong, Yong [1 ,2 ]
Yuan, Xin [3 ,4 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China
[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China
关键词
Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;
D O I
10.1109/TPAMI.2022.3225382
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.
引用
收藏
页码:9072 / 9089
页数:18
相关论文
共 50 条
  • [31] Multi-Scale Spatial-Temporal Transformer: A Novel Framework for Spatial-Temporal Edge Data Prediction
    Ming, Junhao
    Zhang, Dongmei
    Han, Wei
    APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [32] Weakly-supervised spatial-temporal video grounding via spatial-temporal annotation on a frame
    Luo, Shu
    Jiang, Shijie
    Cao, Da
    Deng, Huangxiao
    Wang, Jiawei
    Qin, Zheng
    KNOWLEDGE-BASED SYSTEMS, 2025, 314
  • [33] Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer
    Lv, Xiaoqian
    Zhang, Shengping
    Wang, Chenyang
    Zhang, Weigang
    Yao, Hongxun
    Huang, Qingming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4701 - 4715
  • [34] Video Snapshot Compressive Imaging Using Residual Ensemble Network
    Sun, Yubao
    Chen, Xunhao
    Kankanhalli, Mohan S.
    Liu, Qingshan
    Li, Junxia
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 5931 - 5943
  • [35] Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging
    Wu, Zongliang
    Yang, Chengshuai
    Su, Xiongfei
    Yuan, Xin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (07) : 1662 - 1679
  • [36] Plug-and-Play Algorithms for Video Snapshot Compressive Imaging
    Yuan, Xin
    Liu, Yang
    Suo, Jinli
    Durand, Fredo
    Dai, Qionghai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7093 - 7111
  • [37] Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging
    Zongliang Wu
    Chengshuai Yang
    Xiongfei Su
    Xin Yuan
    International Journal of Computer Vision, 2023, 131 : 1662 - 1679
  • [38] End-to-End Video Snapshot Compressive Imaging using Video Transformers
    Saideni, Wael
    Courreges, Fabien
    Helbert, David
    Cances, Jean Pierre
    2022 ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2022,
  • [39] Dual-Window Multiscale Transformer for Hyperspectral Snapshot Compressive Imaging
    Luo, Fulin
    Chen, Xi
    Gong, Xiuwen
    Wu, Weiwen
    Guo, Tan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3972 - 3980
  • [40] Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer
    LISHUN WANG
    ZONGLIANG WU
    YONG ZHONG
    XIN YUAN
    Photonics Research, 2022, (08) : 1848 - 1858