Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

被引:3
|
作者
Cao, Miao [1 ,2 ,3 ]
Wang, Lishun [2 ,3 ]
Zhu, Mingyu [2 ,3 ]
Yuan, Xin [2 ,3 ]
机构
[1] Zhejiang Univ, Hangzhou 310058, Zhejiang, Peoples R China
[2] Westlake Univ, Sch Engn, Hangzhou 310030, Zhejiang, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future, Hangzhou 310030, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Computational imaging; Snapshot compressive imaging; Compressive sensing; Deep learning; Convolutional neural networks; Transformer;
D O I
10.1007/s11263-024-02101-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) uses a low-speed 2D detector to capture high-speed scene, where the dynamic scene is modulated by different masks and then compressed into a snapshot measurement. Following this, a reconstruction algorithm is needed to reconstruct the high-speed video frames. Although state-of-the-art (SOTA) deep learning-based reconstruction algorithms have achieved impressive results, they still face the following challenges due to excessive model complexity and GPU memory limitations: (1) These models need high computational cost, and (2) They are usually unable to reconstruct large-scale video frames at high compression ratios. To address these issues, we develop an efficient network for video SCI by using hierarchical residual-like connections and hybrid CNN-Transformer structure within a single residual block, dubbed EfficientSCI++. The EfficientSCI++ network can well explore spatial-temporal correlation using convolution in the spatial domain and Transformer in the temporal domain, respectively. We are the first time to demonstrate that a UHD color video (1644x3840x3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1644\times {3840}\times {3}$$\end{document}) with high compression ratio (40) can be reconstructed from a snapshot 2D measurement using a single end-to-end deep learning model with PSNR above 34 dB. Moreover, a mixed-precision model is trained to further accelerate the video SCI reconstruction process and save memory footprint. Extensive results on both simulation and real data demonstrate that, compared with precious SOTA methods, our proposed EfficientSCI++ and EfficientSCI can achieve comparable reconstruction quality with much cheaper computational cost and better real-time performance. Code is available at https://github.com/mcao92/EfficientSCI-plus-plus.
引用
收藏
页码:4521 / 4540
页数:20
相关论文
共 50 条
  • [1] Multi-Scale CNN-Transformer Dual Network for Hyperspectral Compressive Snapshot Reconstruction
    Huang, Kaixuan
    Sun, Yubao
    Gu, Quan
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [2] Pest-ConFormer: A hybrid CNN-Transformer architecture for large-scale multi-class crop pest recognition
    Fang, Mingwei
    Tan, Zhiping
    Tang, Yu
    Chen, Weizhao
    Huang, Huasheng
    Dananjayan, Sathian
    He, Yong
    Luo, Shaoming
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 255
  • [3] CNN-Transformer Hybrid Architecture for Early Fire Detection
    Yang, Chenyue
    Pan, Yixuan
    Cao, Yichao
    Lu, Xiaobo
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 570 - 581
  • [4] Hybrid CNN-transformer network for efficient CSI feedback
    Zhao, Ruohan
    Liu, Ziang
    Song, Tianyu
    Jin, Jiyu
    Jin, Guiyue
    Fan, Lei
    PHYSICAL COMMUNICATION, 2024, 66
  • [5] CNN-Transformer Hybrid Architecture for Underwater Sonar Image Segmentation
    Lei, Juan
    Wang, Huigang
    Lei, Zelin
    Li, Jiayuan
    Rong, Shaowei
    REMOTE SENSING, 2025, 17 (04)
  • [6] Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
    Wang, Ping
    Zhang, Yulun
    Wang, Lishun
    Yuan, Xin
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 104 - 122
  • [7] A Computationally Efficient Neural Video Compression Accelerator Based on a Sparse CNN-Transformer Hybrid Network
    Zhang, Siyu
    Mao, Wendong
    Shi, Huihong
    Wang, Zhongfeng
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [8] Plug-and-Play Algorithms for Large-scale Snapshot Compressive Imaging
    Yuan, Xin
    Liu, Yang
    Suo, Jinli
    Dai, Qionghai
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 1444 - 1454
  • [9] Rethinking Image Deblurring via CNN-Transformer Multiscale Hybrid Architecture
    Zhao, Qian
    Yang, Hao
    Zhou, Dongming
    Cao, Jinde
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [10] A Hybrid CNN-Transformer Architecture for Semantic Segmentation of Radar Sounder data
    Ghosh, Raktim
    Bovolo, Francesca
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1320 - 1323