Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引：28

作者：

Wang, Lishun ^{[1
,2
]}

Cao, Miao ^{[3
,4
]}

Zhong, Yong ^{[1
,2
]}

Yuan, Xin ^{[3
,4
]}

机构：

[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China

[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 07期

关键词：

Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;

D O I：

10.1109/TPAMI.2022.3225382

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.

引用

页码：9072 / 9089

页数：18

共 50 条

[1] Snapshot spatial-temporal compressive imaging
Qiao, Mu
Liu, Xuan
Yuan, Xin
OPTICS LETTERS, 2020, 45 (07) : 1659 - 1662
[2] Provable deep video denoiser using spatial-temporal information for video snapshot compressive imaging: Algorithm and convergence analysis
Shi, Baoshun
Li, Dan
Wang, Yuxin
Su, Yueming
Lian, Qiusheng
SIGNAL PROCESSING, 2024, 214
[3] Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
Wang, Ping
Zhang, Yulun
Wang, Lishun
Yuan, Xin
COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 104 - 122
[4] Perceptual Spatial-temporal Video Compressive Sensing Network
Liu, Wan
Xie, Xuemei
Zhao, Zhifu
Shi, Guangming
ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
[5] ShiftFormer: Spatial-Temporal Shift Operation in Video Transformer
Yang, Beiying
Zhu, Guibo
Ge, Guojing
Luo, Jinzhao
Wang, Jinqiao
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1895 - 1900
[6] Learning a spatial-temporal texture transformer network for video inpainting
Ma, Pengsen
Xue, Tao
FRONTIERS IN NEUROROBOTICS, 2022, 16
[7] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
Zhao, Cairong
Wang, Chutian
Hu, Guosheng
Chen, Haonan
Liu, Chun
Tang, Jinhui
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
[8] Transformer-Based Cascading Reconstruction Network for Video Snapshot Compressive Imaging
Wen, Jiaxuan
Huang, Junru
Chen, Xunhao
Huang, Kaixuan
Sun, Yubao
APPLIED SCIENCES-BASEL, 2023, 13 (10):
[9] Spatial-temporal Graph Transformer Network for Spatial-temporal Forecasting
Dao, Minh-Son
Zetsu, Koji
Hoang, Duy-Tang
Proceedings - 2024 IEEE International Conference on Big Data, BigData 2024, 2024, : 1276 - 1281
[10] Two-step spatial-temporal compressive sensing imaging
Zhao, Dingaoyu
Ke, Jun
ADVANCED OPTICAL IMAGING TECHNOLOGIES IV, 2021, 11896

← 1 2 3 4 5 →