Shortcut-V2V: Compression Framework for Video-to-Video Translation based on Temporal Redundancy Reduction

被引:0
|
作者
Chung, Chaeyeon [1 ]
Park, Yeojeong [1 ,2 ]
Choi, Seunghwan [1 ]
Ganbat, Munkhsoyol [1 ]
Choo, Jaegul [1 ]
机构
[1] IKAIST AI, Daejeon, South Korea
[2] KT Corp, KT Res & Dev Ctr, Seongnam Si, South Korea
关键词
D O I
10.1109/ICCV51070.2023.00700
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-to-video translation aims to generate video frames of a target domain from an input video. Despite its usefulness, the existing networks require enormous computations, necessitating their model compression for wide use. While there exist compression methods that improve computational efficiency in various image/video tasks, a generally-applicable compression method for video-to-video translation has not been studied much. In response, we present Shortcut-V2V, a general-purpose compression framework for video-to-video translation. Shortcut-V2V avoids full inference for every neighboring video frame by approximating the intermediate features of a current frame from those of the previous frame. Moreover, in our framework, a newly-proposed block called AdaBD adaptively blends and deforms features of neighboring frames, which makes more accurate predictions of the intermediate features possible. We conduct quantitative and qualitative evaluations using well-known video-to-video translation models on various tasks to demonstrate the general applicability of our framework. The results show that Shortcut-V2V achieves comparable performance compared to the original video-to-video translation model while saving 3.2-5.7x computational cost and 7.8-44x memory at test time. Our code and videos are available at https://shortcut-v2v.github.io/.
引用
收藏
页码:7578 / 7588
页数:11
相关论文
共 24 条
  • [21] Improved grid refine segmentation for 3D point cloud in video-based point cloud compression (V-PCC)
    Lin, Ting-Lan
    Lin, Ching-Hsuan
    Chiou, Yih-Shyh
    Chen, Shih-Lun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (23) : 62701 - 62720
  • [22] YOLO V2 WITH BIFOLD SKIP: A DEEP LEARNING MODEL FOR VIDEO BASED REAL TIME TRAIN BOGIE PART IDENTIFICATION AND DEFECT DETECTION
    Mohan, K. Krishna
    Prasad, Ch Raghava
    Kishore, P. V. V.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2021, 16 (03): : 2166 - 2190
  • [23] I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification
    Joshi, Aditya
    Diwakar, Manoj
    COGNITIVE COMPUTATION, 2025, 17 (01)
  • [24] FMS: Enhancing Fleet Management Scheme with Long Term Low-Latency V2X Services and Edge-based Video Stream Analytics
    Mahajan, Kashish
    Rawlley, Oshin
    Gupta, Shashank
    Singh, Shikhar
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1822 - 1827