Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引:28
|
作者
Wang, Lishun [1 ,2 ]
Cao, Miao [3 ,4 ]
Zhong, Yong [1 ,2 ]
Yuan, Xin [3 ,4 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China
[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China
关键词
Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;
D O I
10.1109/TPAMI.2022.3225382
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.
引用
收藏
页码:9072 / 9089
页数:18
相关论文
共 50 条
  • [21] HierGAT: hierarchical spatial-temporal network with graph and transformer for video HOI detection
    Wu, Junxian
    Zhang, Yujia
    Kampffmeyer, Michael
    Pan, Yi
    Zhang, Chenyu
    Sun, Shiying
    Chang, Hui
    Zhao, Xiaoguang
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [22] Temporal Compressive Imaging for Video
    Zhou, Qun
    Zhang, Linxia
    Ke, Jun
    2017 INTERNATIONAL CONFERENCE ON OPTICAL INSTRUMENTS AND TECHNOLOGY: OPTOELECTRONIC IMAGING/SPECTROSCOPY AND SIGNAL PROCESSING TECHNOLOGY, 2017, 10620
  • [23] Video Snapshot Compressive Imaging via Optical Flow
    Chen, Zan
    Li, Ran
    Li, Yongqiang
    Feng, Yuanjing
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2177 - 2182
  • [24] Deep Motion Regularizer for Video Snapshot Compressive Imaging
    Chen, Zan
    Li, Ran
    Li, Yongqiang
    Feng, Yuanjing
    Hou, Xingsong
    Qian, Xueming
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2024, 10 : 1519 - 1532
  • [25] Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging
    Cao, Miao
    Wang, Lishun
    Zhu, Mingyu
    Yuan, Xin
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (10) : 4521 - 4540
  • [26] Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging
    Zheng, Siming
    Yuan, Xin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12692 - 12703
  • [27] Video Description with Spatial-Temporal Attention
    Tu, Yunbin
    Zhang, Xishan
    Liu, Bingtao
    Yan, Chenggang
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1014 - 1022
  • [28] Concurrent Transformer for Spatial-Temporal Graph Modeling
    Xie, Yi
    Xiong, Yun
    Zhu, Yangyong
    Yu, Philip S.
    Jin, Cheng
    Wang, Qiang
    Li, Haihong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT III, 2022, : 314 - 321
  • [29] Optimum Video Subset and Spatial-Temporal Video Retrieval
    Wang M.-Z.
    Liu X.-J.
    Sun K.-X.
    Wang Z.-R.
    Jisuanji Xuebao/Chinese Journal of Computers, 2019, 42 (09): : 2004 - 2023
  • [30] An efficient spatial-temporal transformer with temporal aggregation and spatial memory for traffic forecasting
    Liu, Aoyu
    Zhang, Yaying
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 250