VMG: Rethinking U-Net Architecture for Video Super-Resolution

被引:0
|
作者
Tang, Jun [1 ]
Niu, Lele [1 ]
Liu, Linlin [1 ]
Dai, Hang [2 ]
Ding, Yong [1 ]
机构
[1] Zhejiang Univ, Coll Integrated Circuits, Hangzhou 310000, Peoples R China
[2] Univ Glasgow, Sch Comp Sci, Glasgow G12 8QQ, Scotland
关键词
Computer architecture; Data mining; Superresolution; Mixers; Feature extraction; Computational modeling; Transformers; Logic gates; Correlation; Decoding; Video super-resolution; U-Net architecture; spatial-temporal; complexity; MLP;
D O I
10.1109/TBC.2024.3486967
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The U-Net architecture has exhibited significant efficacy across various vision tasks, yet its adaptation for Video Super-Resolution (VSR) remains underexplored. While the Video Restoration Transformer (VRT) introduced U-Net into the VSR domain, it poses challenges due to intricate design and substantial computational overhead. In this paper, we present VMG, a streamlined framework tailored for VSR. Through empirical analysis, we identify the crucial stages of the U-Net architecture contributing to performance enhancement in VSR tasks. Our optimized architecture substantially reduces model parameters and complexity while improving performance. Additionally, we introduce two key modules, namely the Gated MLP-like Mixer (GMM) and the Flow-Guided cross-attention Mixer (FGM), designed to enhance spatial and temporal feature aggregation. GMM dynamically encodes spatial correlations with linear complexity in space and time, and FGM leverages optical flow to capture motion variation and implement sparse attention to efficiently aggregate temporally related information. Extensive experiments demonstrate that VMG achieves nearly 70% reduction in GPU memory usage, 30% fewer parameters, and 10% lower computational complexity (FLOPs) compared to VRT, while yielding highly competitive or superior results across four benchmark datasets. Qualitative assessments reveal VMG's ability to preserve remarkable details and sharp structures in the reconstructed videos.
引用
收藏
页码:334 / 349
页数:16
相关论文
共 50 条
  • [21] MICU: Image super-resolution via multi-level information compensation and U-net
    Chen, Yuantao
    Xia, Runlong
    Yang, Kai
    Zou, Ke
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [22] D2UNet: Dual Decoder U-Net for Seismic Image Super-Resolution Reconstruction
    Min, Fan
    Wang, Linrong
    Pan, Shulin
    Song, Guojie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [23] Expanding Horizons: U-Net Enhancements for Semantic Segmentation, Forecasting, and Super-Resolution in Ocean Remote Sensing
    Wang, Haoyu
    Li, Xiaofeng
    JOURNAL OF REMOTE SENSING, 2024, 4
  • [24] MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation
    Ibtehaz, Nabil
    Rahman, M. Sohel
    Neural Networks, 2020, 121 : 74 - 87
  • [25] MultiResUNet : Rethinking the U-Net architecture for multimodal biomedical image segmentation
    Ibtehaz, Nabil
    Rahman, M. Sohel
    NEURAL NETWORKS, 2020, 121 : 74 - 87
  • [26] Label Super-Resolution for 3D Magnetic Resonance Images using Deformable U-net
    Liu, Di
    Liu, Jiang
    Liu, Yihao
    Tao, Ran
    Prince, Jerry L.
    Carass, Aaron
    MEDICAL IMAGING 2021: IMAGE PROCESSING, 2021, 11596
  • [27] IESRGAN: Enhanced U-Net Structured Generative Adversarial Network for Remote Sensing Image Super-Resolution Reconstruction
    Yue, Xiaohan
    Liu, Danfeng
    Wang, Liguo
    Benediktsson, Jon Atli
    Meng, Linghong
    Deng, Lei
    REMOTE SENSING, 2023, 15 (14)
  • [28] Light-Guided and Cross-Fusion U-Net for Anti-Illumination Image Super-Resolution
    Cheng, Deqiang
    Chen, Liangliang
    Lv, Chen
    Guo, Lin
    Kou, Qiqi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (12) : 8436 - 8449
  • [29] Progressive U-Net residual network for computed tomography images super-resolution in the screening of COVID-19
    Qiu, Defu
    Cheng, Yuhu
    Wang, Xuesong
    JOURNAL OF RADIATION RESEARCH AND APPLIED SCIENCES, 2021, 14 (01) : 369 - 379
  • [30] EVSRNet: Efficient Video Super-Resolution with Neural Architecture Search
    Liu, Shaoli
    Zheng, Chengjian
    Lu, Kaidi
    Gao, Si
    Wang, Ning
    Wang, Bofei
    Zhang, Diankai
    Zhang, Xiaofeng
    Xu, Tianyu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2480 - 2485