HyperCon: Image-To-Video Model Transfer for Video-To-Video Translation Tasks

被引:3
|
作者
Szeto, Ryan [1 ]
El-Khamy, Mostafa [2 ]
Lee, Jungwon [2 ]
Corso, Jason J. [1 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Samsung Semicond Inc, San Jose, CA USA
关键词
D O I
10.1109/WACV48630.2021.00312
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video-to-video translation is more difficult than image-to-image translation due to the temporal consistency problem that, if unaddressed, leads to distracting flickering effects. Although video models designed from scratch produce temporally consistent results, training them to match the vast visual knowledge captured by image models requires an intractable number of videos. To combine the benefits of image and video models, we propose an image-to-video model transfer method called Hyperconsistency (HyperCon) that transforms any well-trained image model into a temporally consistent video model without fine-tuning. HyperCon works by translating a temporally interpolated video frame-wise and then aggregating over temporally localized windows on the interpolated video. It handles both masked and unmasked inputs, enabling support for even more video-to-video translation tasks than prior image-to-video model transfer techniques. We demonstrate HyperCon on video style transfer and inpainting, where it performs favorably compared to prior state-of-the-art methods without training on a single stylized or incomplete video. Our project website is available at ryanszeto.com/projects/hypercon.
引用
收藏
页码:3079 / 3088
页数:10
相关论文
共 50 条
  • [1] Video-to-Video Translation with Global Temporal Consistency
    Wei, Xingxing
    Zhu, Jun
    Feng, Sitong
    Su, Hang
    [J]. PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 18 - 25
  • [2] Unsupervised Image-to-Video Clothing Transfer
    Pumarola, A.
    Goswami, V.
    Vicente, F.
    De la Torre, F.
    Moreno-Noguer, F.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3181 - 3184
  • [3] Mocycle-GAN: Unpaired Video-to-Video Translation
    Chen, Yang
    Pan, Yingwei
    Yao, Ting
    Tian, Xinmei
    Mei, Tao
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 647 - 655
  • [4] An Image-to-video Model for Real-Time Video Enhancement
    She, Dongyu
    Xu, Kun
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 1837 - 1846
  • [5] MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance
    Chu, Ernie
    Huang, Tzuhsuan
    Lin, Shuo-Yen
    Chen, Jun-Cheng
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1353 - 1361
  • [6] Video-to-Video Synthesis
    Wang, Ting-Chun
    Liu, Ming-Yu
    Zhu, Jun-Yan
    Liu, Guilin
    Tao, Andrew
    Kautz, Jan
    Catanzaro, Bryan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] Facial Image-to-Video Translation by a Hidden Affine Transformation
    Shen, Guangyao
    Huang, Wenbing
    Gan, Chuang
    Tan, Mingkui
    Huang, Junzhou
    Zhu, Wenwu
    Gong, Boqing
    [J]. PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 2505 - 2513
  • [8] Application of Video-to-Video Translation Networks to Computational Fluid Dynamics
    Kigure, Hiromitsu
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [9] Unsupervised video-to-video translation with preservation of frame modification tendency
    Liu, Huajun
    Li, Chao
    Lei, Dian
    Zhu, Qing
    [J]. VISUAL COMPUTER, 2020, 36 (10-12): : 2105 - 2116
  • [10] Unsupervised video-to-video translation with preservation of frame modification tendency
    Huajun Liu
    Chao Li
    Dian Lei
    Qing Zhu
    [J]. The Visual Computer, 2020, 36 : 2105 - 2116