Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引：28

作者：

Wang, Lishun ^{[1
,2
]}

Cao, Miao ^{[3
,4
]}

Zhong, Yong ^{[1
,2
]}

Yuan, Xin ^{[3
,4
]}

机构：

[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China

[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 07期

关键词：

Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;

D O I：

10.1109/TPAMI.2022.3225382

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.

引用

页码：9072 / 9089

页数：18

共 50 条

[41] Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer
Wang, Lishun
Wu, Zongliang
Zhong, Yong
Yuan, Xin
PHOTONICS RESEARCH, 2022, 10 (08) : 1848 - 1858
[42] Spatial-temporal decorrelation for image/video coding
Wang, Miaohui
Ngan, King Ngi
Xu, Long
2012 PICTURE CODING SYMPOSIUM (PCS), 2012, : 201 - 204
[43] SPATIAL-TEMPORAL ATTENTION ANALYSIS FOR HOME VIDEO
Qiu, Xuekan
Jiang, Shuqiang
Liu, Huiying
Huang, Qingming
Cao, Longbing
2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1517 - +
[44] Video summarization by spatial-temporal graph optimization
Lu, S
Lyu, MR
King, I
2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 2, PROCEEDINGS, 2004, : 197 - 200
[45] A spatial-temporal graph gated transformer for traffic forecasting
Bouchemoukha, Haroun
Zennir, Mohamed Nadjib
Alioua, Ahmed
TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (07):
[46] A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting
Li, Guanyao
Zhong, Shuhan
Deng, Xingdong
Xiang, Letian
Chan, S. -H. Gary
Li, Ruiyuan
Liu, Yang
Zhang, Ming
Hung, Chih-Chieh
Peng, Wen-Chih
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 10967 - 10980
[47] Spatial-Temporal Transformer for Crime Recognition in Surveillance Videos
Boekhoudt, Kayleigh
Talavera, Estefania
2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022), 2022,
[48] Graph Spatial-Temporal Transformer Network for Traffic Prediction
Zhao, Zhenzhen
Shen, Guojiang
Wang, Lei
Kong, Xiangjie
BIG DATA RESEARCH, 2024, 36
[49] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
Cong, Yuren
Liao, Wentong
Ackermann, Hanno
Rosenhahn, Bodo
Yang, Michael Ying
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16352 - 16362
[50] A Multitemporal Scale and Spatial-Temporal Transformer Network for Temporal Action Localization
Gao, Zan
Cui, Xinglei
Zhuo, Tao
Cheng, Zhiyong
Liu, An-An
Wang, Meng
Chen, Shenyong
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (03) : 569 - 580

← 1 2 3 4 5 →