Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引：28

作者：

Wang, Lishun ^{[1
,2
]}

Cao, Miao ^{[3
,4
]}

Zhong, Yong ^{[1
,2
]}

Yuan, Xin ^{[3
,4
]}

机构：

[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China

[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 07期

关键词：

Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;

D O I：

10.1109/TPAMI.2022.3225382

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.

引用

页码：9072 / 9089

页数：18

共 50 条

[21] HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection
Wu, Junxian
Zhang, Yujia
Kampffmeyer, Michael
Pan, Yi
Zhang, Chenyu
Sun, Shiying
Chang, Hui
Zhao, Xiaoguang
MULTIMEDIA SYSTEMS, 2025, 31 (01)
[22] Temporal Compressive Imaging for Video
Zhou, Qun
Zhang, Linxia
Ke, Jun
2017 INTERNATIONAL CONFERENCE ON OPTICAL INSTRUMENTS AND TECHNOLOGY: OPTOELECTRONIC IMAGING/SPECTROSCOPY AND SIGNAL PROCESSING TECHNOLOGY, 2017, 10620
[23] Video Snapshot Compressive Imaging via Optical Flow
Chen, Zan
Li, Ran
Li, Yongqiang
Feng, Yuanjing
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2177 - 2182
[24] Deep Motion Regularizer for Video Snapshot Compressive Imaging
Chen, Zan
Li, Ran
Li, Yongqiang
Feng, Yuanjing
Hou, Xingsong
Qian, Xueming
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2024, 10 : 1519 - 1532
[25] Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging
Cao, Miao
Wang, Lishun
Zhu, Mingyu
Yuan, Xin
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4521 - 4540
[26] Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging
Zheng, Siming
Yuan, Xin
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12692 - 12703
[27] Video Description with Spatial-Temporal Attention
Tu, Yunbin
Zhang, Xishan
Liu, Bingtao
Yan, Chenggang
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1014 - 1022
[28] Concurrent Transformer for Spatial-Temporal Graph Modeling
Xie, Yi
Xiong, Yun
Zhu, Yangyong
Yu, Philip S.
Jin, Cheng
Wang, Qiang
Li, Haihong
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 314 - 321
[29] Optimum Video Subset and Spatial-Temporal Video Retrieval
Wang M.-Z.
Liu X.-J.
Sun K.-X.
Wang Z.-R.
Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (09): : 2004 - 2023
[30] An efficient spatial-temporal transformer with temporal aggregation and spatial memory for traffic forecasting
Liu, Aoyu
Zhang, Yaying
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 250

← 1 2 3 4 5 →