Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

被引:16
|
作者
Kim, Dahun [1 ]
Woo, Sanghyun [1 ]
Lee, Joon-Young [2 ]
Kweon, In So [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea
[2] Adobe Res, San Jose, CA USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/CVPR.2019.00439
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Blind video decaptioning is a problem of automatically removing text overlays and inpainting the occluded parts in videos without any input masks. While recent deep learning based inpainting methods deal with a single image and mostly assume that the positions of the corrupted pixels are known, we aim at automatic text removal in video sequences without mask information. In this paper, we propose a simple yet effective framework for fast blind video decaptioning. We construct an encoder-decoder model, where the encoder takes multiple source frames that can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a residual connection from the input frame to the decoder output to enforce our network to focus on the corrupted regions only. Our proposed model was ranked in the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track2: Video decaptioning. In addition, we further improve this strong model by applying a recurrent feedback. The recurrent feedback not only enforces temporal coherence but also provides strong clues on where the corrupted pixels are. Both qualitative and quantitative experiments demonstrate that our full model produces accurate and temporally consistent video results in real time (50+fps).
引用
收藏
页码:4258 / 4267
页数:10
相关论文
共 50 条
  • [1] Recurrent Temporal Aggregation Framework for Deep Video Inpainting
    Kim, Dahun
    Woo, Sanghyun
    Lee, Joon-Young
    Kweon, In So
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1038 - 1052
  • [2] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
    Sun, Yanan
    Wang, Guanzhi
    Gu, Qiao
    Tang, Chi-Keung
    Tai, Yu-Wing
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980
  • [3] DEEP BLIND VIDEO QUALITY ASSESSMENT BASED ON TEMPORAL HUMAN PERCEPTION
    Ahn, Sewoong
    Lee, Sanghoon
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 619 - 623
  • [4] Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
    Zhou, Wei
    Chen, Zhibo
    2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 338 - 341
  • [5] Blind Video Temporal Consistency
    Bonneel, Nicolas
    Tompkin, James
    Sunkavalli, Kalyan
    Sun, Deqing
    Paris, Sylvain
    Pfister, Hanspeter
    ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
  • [6] A Deep Spatial and Temporal Aggregation Framework for Video-Based Facial Expression Recognition
    Pan, Xianzhang
    Ying, Guoliang
    Chen, Guodong
    Li, Hongming
    Li, Wenshu
    IEEE ACCESS, 2019, 7 : 48807 - 48815
  • [7] Deep Temporal-Spatial Aggregation for Video-Based Facial Expression Recognition
    Pan, Xianzhang
    Guo, Wenping
    Guo, Xiaoying
    Li, Wenshu
    Xu, Junjie
    Wu, Jinzhao
    SYMMETRY-BASEL, 2019, 11 (01):
  • [8] Learning Blind Video Temporal Consistency
    Lai, Wei-Sheng
    Huang, Jia-Bin
    Wang, Oliver
    Shechtman, Eli
    Yumer, Ersin
    Yang, Ming-Hsuan
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 179 - 195
  • [9] ViDeNN: Deep Blind Video Denoising
    Claus, Michele
    van Gemert, Jan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1843 - 1852
  • [10] Blind Natural Video Quality Prediction via Statistical Temporal Features and Deep Spatial Features
    Korhonen, Jari
    Su, Yicheng
    You, Junyong
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3311 - 3319