Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

被引:3
|
作者
Cao, Miao [1 ,2 ,3 ]
Wang, Lishun [2 ,3 ]
Zhu, Mingyu [2 ,3 ]
Yuan, Xin [2 ,3 ]
机构
[1] Zhejiang Univ, Hangzhou 310058, Zhejiang, Peoples R China
[2] Westlake Univ, Sch Engn, Hangzhou 310030, Zhejiang, Peoples R China
[3] Westlake Univ, Res Ctr Ind Future, Hangzhou 310030, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Computational imaging; Snapshot compressive imaging; Compressive sensing; Deep learning; Convolutional neural networks; Transformer;
D O I
10.1007/s11263-024-02101-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video snapshot compressive imaging (SCI) uses a low-speed 2D detector to capture high-speed scene, where the dynamic scene is modulated by different masks and then compressed into a snapshot measurement. Following this, a reconstruction algorithm is needed to reconstruct the high-speed video frames. Although state-of-the-art (SOTA) deep learning-based reconstruction algorithms have achieved impressive results, they still face the following challenges due to excessive model complexity and GPU memory limitations: (1) These models need high computational cost, and (2) They are usually unable to reconstruct large-scale video frames at high compression ratios. To address these issues, we develop an efficient network for video SCI by using hierarchical residual-like connections and hybrid CNN-Transformer structure within a single residual block, dubbed EfficientSCI++. The EfficientSCI++ network can well explore spatial-temporal correlation using convolution in the spatial domain and Transformer in the temporal domain, respectively. We are the first time to demonstrate that a UHD color video (1644x3840x3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1644\times {3840}\times {3}$$\end{document}) with high compression ratio (40) can be reconstructed from a snapshot 2D measurement using a single end-to-end deep learning model with PSNR above 34 dB. Moreover, a mixed-precision model is trained to further accelerate the video SCI reconstruction process and save memory footprint. Extensive results on both simulation and real data demonstrate that, compared with precious SOTA methods, our proposed EfficientSCI++ and EfficientSCI can achieve comparable reconstruction quality with much cheaper computational cost and better real-time performance. Code is available at https://github.com/mcao92/EfficientSCI-plus-plus.
引用
收藏
页码:4521 / 4540
页数:20
相关论文
共 50 条
  • [31] Hybrid CNN-Transformer Architecture With Xception-Based Feature Enhancement for Accurate Breast Cancer Classification
    Zeynali, Alireza
    Tinati, Mohammad Ali
    Tazehkand, Behzad Mozaffari
    IEEE ACCESS, 2024, 12 : 189477 - 189493
  • [32] D-TrAttUnet: Toward hybrid CNN-transformer architecture for generic and subtle segmentation in medical images
    Bougourzi F.
    Dornaika F.
    Distante C.
    Taleb-Ahmed A.
    Computers in Biology and Medicine, 2024, 176
  • [33] SWFormer: A scale-wise hybrid CNN-Transformer network for multi-classes weed segmentation
    Jiang, Hongkui
    Chen, Qiupu
    Wang, Rujing
    Du, Jianming
    Chen, Tianjiao
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2024, 36 (07)
  • [34] SYSTEM ARCHITECTURE FOR A LARGE-SCALE VIDEO ON DEMAND SERVICE
    SINCOSKIE, WD
    COMPUTER NETWORKS AND ISDN SYSTEMS, 1991, 22 (02): : 155 - 162
  • [35] TransUMobileNet: Integrating multi-channel attention fusion with hybrid CNN-Transformer architecture for medical image segmentation
    Cai, Sijing
    Jiang, Yukun
    Xiao, Yuwei
    Zeng, Jian
    Zhou, Guangming
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 107
  • [36] Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging
    Zheng, Siming
    Yuan, Xin
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12692 - 12703
  • [37] Efficient Large-Scale Hybrid Fluid Simulation
    Golas, Abhinav
    Narain, Rahul
    Sewall, Jason
    Krajcevski, Pavel
    Lin, Ming
    SIGGRAPH '12: SPECIAL INTEREST GROUP ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES CONFERENCE, 2012,
  • [38] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
    Yu, Zhihong
    Lee, Feifei
    Chen, Qiu
    APPLIED INTELLIGENCE, 2023, 53 (17) : 19990 - 20006
  • [39] HCT-net: hybrid CNN-transformer model based on a neural architecture search network for medical image segmentation
    Zhihong Yu
    Feifei Lee
    Qiu Chen
    Applied Intelligence, 2023, 53 : 19990 - 20006
  • [40] A hybrid CNN-transformer network: Accurate and efficient semantic segmentation of crops and weeds on resource-constrained embedded devices
    Wei, Yifan
    Feng, Yuncong
    Zu, Dongcheng
    Zhang, Xiaoli
    CROP PROTECTION, 2025, 188