Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

被引：16

作者：

Kim, Dahun ^{[1
]}

Woo, Sanghyun ^{[1
]}

Lee, Joon-Young ^{[2
]}

Kweon, In So ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Daejeon, South Korea

[2] Adobe Res, San Jose, CA USA

来源：

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR.2019.00439

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Blind video decaptioning is a problem of automatically removing text overlays and inpainting the occluded parts in videos without any input masks. While recent deep learning based inpainting methods deal with a single image and mostly assume that the positions of the corrupted pixels are known, we aim at automatic text removal in video sequences without mask information. In this paper, we propose a simple yet effective framework for fast blind video decaptioning. We construct an encoder-decoder model, where the encoder takes multiple source frames that can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a residual connection from the input frame to the decoder output to enforce our network to focus on the corrupted regions only. Our proposed model was ranked in the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track2: Video decaptioning. In addition, we further improve this strong model by applying a recurrent feedback. The recurrent feedback not only enforces temporal coherence but also provides strong clues on where the corrupted pixels are. Both qualitative and quantitative experiments demonstrate that our full model produces accurate and temporally consistent video results in real time (50+fps).

引用

页码：4258 / 4267

页数：10

共 50 条

[1] Recurrent Temporal Aggregation Framework for Deep Video Inpainting
Kim, Dahun
Woo, Sanghyun
Lee, Joon-Young
Kweon, In So
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (05) : 1038 - 1052
[2] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
Sun, Yanan
Wang, Guanzhi
Gu, Qiao
Tang, Chi-Keung
Tai, Yu-Wing
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980
[3] DEEP BLIND VIDEO QUALITY ASSESSMENT BASED ON TEMPORAL HUMAN PERCEPTION
Ahn, Sewoong
Lee, Sanghoon
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 619 - 623
[4] Deep Local and Global Spatiotemporal Feature Aggregation for Blind Video Quality Assessment
Zhou, Wei
Chen, Zhibo
2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 338 - 341
[5] Blind Video Temporal Consistency
Bonneel, Nicolas
Tompkin, James
Sunkavalli, Kalyan
Sun, Deqing
Paris, Sylvain
Pfister, Hanspeter
ACM TRANSACTIONS ON GRAPHICS, 2015, 34 (06):
[6] A Deep Spatial and Temporal Aggregation Framework for Video-Based Facial Expression Recognition
Pan, Xianzhang
Ying, Guoliang
Chen, Guodong
Li, Hongming
Li, Wenshu
IEEE ACCESS, 2019, 7 : 48807 - 48815
[7] Deep Temporal-Spatial Aggregation for Video-Based Facial Expression Recognition
Pan, Xianzhang
Guo, Wenping
Guo, Xiaoying
Li, Wenshu
Xu, Junjie
Wu, Jinzhao
SYMMETRY-BASEL, 2019, 11 (01):
[8] Learning Blind Video Temporal Consistency
Lai, Wei-Sheng
Huang, Jia-Bin
Wang, Oliver
Shechtman, Eli
Yumer, Ersin
Yang, Ming-Hsuan
COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 179 - 195
[9] ViDeNN: Deep Blind Video Denoising
Claus, Michele
van Gemert, Jan
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1843 - 1852
[10] Blind Natural Video Quality Prediction via Statistical Temporal Features and Deep Spatial Features
Korhonen, Jari
Su, Yicheng
You, Junyong
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 3311 - 3319

← 1 2 3 4 5 →