Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引：28

作者：

Wang, Lishun ^{[1
,2
]}

Cao, Miao ^{[3
,4
]}

Zhong, Yong ^{[1
,2
]}

Yuan, Xin ^{[3
,4
]}

机构：

[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China

[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 07期

关键词：

Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;

D O I：

10.1109/TPAMI.2022.3225382

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.

引用

页码：9072 / 9089

页数：18

共 50 条

[31] Multi-Scale Spatial-Temporal Transformer: A Novel Framework for Spatial-Temporal Edge Data Prediction
Ming, Junhao
Zhang, Dongmei
Han, Wei
APPLIED SCIENCES-BASEL, 2023, 13 (17):
[32] Weakly-supervised spatial-temporal video grounding via spatial-temporal annotation on a frame
Luo, Shu
Jiang, Shijie
Cao, Da
Deng, Huangxiao
Wang, Jiawei
Qin, Zheng
KNOWLEDGE-BASED SYSTEMS, 2025, 314
[33] Unsupervised Low-Light Video Enhancement With Spatial-Temporal Co-Attention Transformer
Lv, Xiaoqian
Zhang, Shengping
Wang, Chenyang
Zhang, Weigang
Yao, Hongxun
Huang, Qingming
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4701 - 4715
[34] Video Snapshot Compressive Imaging Using Residual Ensemble Network
Sun, Yubao
Chen, Xunhao
Kankanhalli, Mohan S.
Liu, Qingshan
Li, Junxia
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 5931 - 5943
[35] Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging
Wu, Zongliang
Yang, Chengshuai
Su, Xiongfei
Yuan, Xin
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (07) : 1662 - 1679
[36] Plug-and-Play Algorithms for Video Snapshot Compressive Imaging
Yuan, Xin
Liu, Yang
Suo, Jinli
Durand, Fredo
Dai, Qionghai
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7093 - 7111
[37] Adaptive Deep PnP Algorithm for Video Snapshot Compressive Imaging
Zongliang Wu
Chengshuai Yang
Xiongfei Su
Xin Yuan
International Journal of Computer Vision, 2023, 131 : 1662 - 1679
[38] End-to-End Video Snapshot Compressive Imaging using Video Transformers
Saideni, Wael
Courreges, Fabien
Helbert, David
Cances, Jean Pierre
2022 ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2022,
[39] Dual-Window Multiscale Transformer for Hyperspectral Snapshot Compressive Imaging
Luo, Fulin
Chen, Xi
Gong, Xiuwen
Wu, Weiwen
Guo, Tan
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3972 - 3980
[40] Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer
LISHUN WANG
ZONGLIANG WU
YONG ZHONG
XIN YUAN
Photonics Research, 2022, (08) : 1848 - 1858

← 1 2 3 4 5 →