Spatial-Temporal Transformer for Video Snapshot Compressive Imaging

被引:28
|
作者
Wang, Lishun [1 ,2 ]
Cao, Miao [3 ,4 ]
Zhong, Yong [1 ,2 ]
Yuan, Xin [3 ,4 ]
机构
[1] Chinese Acad Sci, Chengdu Inst Com puter Applicat, Chengdu 610041, Sichuan, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future Res, Hangzhou 310030, Peoples R China
[4] Westlake Univ, Sch Engn, Hangzhou 310030, Peoples R China
关键词
Attention; coded aperture compressive temporal imaging (CACTI); compressive sensing; convolutional neural networks; deep learning; snapshot compressive imaging; transformer; MODEL;
D O I
10.1109/TPAMI.2022.3225382
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) captures multiple sequential video frames by a single measurement using the idea of computational imaging. The underlying principle is to modulate high-speed frames through different masks and these modulated frames are summed to a single measurement captured by a low-speed 2D sensor (dubbed optical encoder); following this, algorithms are employed to reconstruct the desired high-speed frames (dubbed software decoder) if needed. In this article, we consider the reconstruction algorithm in video SCI, i.e., recovering a series of video frames from a compressed measurement. Specifically, we propose a Spatial-Temporal transFormer (STFormer) to exploit the correlation in both spatial and temporal domains. STFormer network is composed of a token generation block, a video reconstruction block, and these two blocks are connected by a series of STFormer blocks. Each STFormer block consists of a spatial self-attention branch, a temporal self-attention branch and the outputs of these two branches are integrated by a fusion network. Extensive results on both simulated and real data demonstrate the state-of-the-art performance of STFormer. The code and models are publicly available at https://github.com/ucaswangls/STFormer.
引用
收藏
页码:9072 / 9089
页数:18
相关论文
共 50 条
  • [41] Snapshot spectral compressive imaging reconstruction using convolution and contextual Transformer
    Wang, Lishun
    Wu, Zongliang
    Zhong, Yong
    Yuan, Xin
    PHOTONICS RESEARCH, 2022, 10 (08) : 1848 - 1858
  • [42] Spatial-temporal decorrelation for image/video coding
    Wang, Miaohui
    Ngan, King Ngi
    Xu, Long
    2012 PICTURE CODING SYMPOSIUM (PCS), 2012, : 201 - 204
  • [43] SPATIAL-TEMPORAL ATTENTION ANALYSIS FOR HOME VIDEO
    Qiu, Xuekan
    Jiang, Shuqiang
    Liu, Huiying
    Huang, Qingming
    Cao, Longbing
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-4, 2008, : 1517 - +
  • [44] Video summarization by spatial-temporal graph optimization
    Lu, S
    Lyu, MR
    King, I
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 2, PROCEEDINGS, 2004, : 197 - 200
  • [45] A spatial-temporal graph gated transformer for traffic forecasting
    Bouchemoukha, Haroun
    Zennir, Mohamed Nadjib
    Alioua, Ahmed
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2024, 35 (07):
  • [46] A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting
    Li, Guanyao
    Zhong, Shuhan
    Deng, Xingdong
    Xiang, Letian
    Chan, S. -H. Gary
    Li, Ruiyuan
    Liu, Yang
    Zhang, Ming
    Hung, Chih-Chieh
    Peng, Wen-Chih
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 10967 - 10980
  • [47] Spatial-Temporal Transformer for Crime Recognition in Surveillance Videos
    Boekhoudt, Kayleigh
    Talavera, Estefania
    2022 18TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2022), 2022,
  • [48] Graph Spatial-Temporal Transformer Network for Traffic Prediction
    Zhao, Zhenzhen
    Shen, Guojiang
    Wang, Lei
    Kong, Xiangjie
    BIG DATA RESEARCH, 2024, 36
  • [49] Spatial-Temporal Transformer for Dynamic Scene Graph Generation
    Cong, Yuren
    Liao, Wentong
    Ackermann, Hanno
    Rosenhahn, Bodo
    Yang, Michael Ying
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 16352 - 16362
  • [50] A Multitemporal Scale and Spatial-Temporal Transformer Network for Temporal Action Localization
    Gao, Zan
    Cui, Xinglei
    Zhuo, Tao
    Cheng, Zhiyong
    Liu, An-An
    Wang, Meng
    Chen, Shenyong
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2023, 53 (03) : 569 - 580