DeepVideoMVS: Multi-View Stereo on Video with Recurrent Spatio-Temporal Fusion

被引：35

作者：

Duzceker, Arda ^{[1
]}

Galliani, Silvano ^{[2
]}

Vogel, Christoph ^{[2
]}

Speciale, Pablo ^{[2
]}

Dusmanu, Mihai ^{[1
]}

Pollefeys, Marc ^{[1
,2
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

[2] Microsoft Mixed Real & AI Zurich Lab, Zurich, Switzerland

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.01507

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose an online multi-view depth prediction approach on posed video streams, where the scene geometry information computed in the previous time steps is propagated to the current time step in an efficient and geometrically plausible way. The backbone of our approach is a real-time capable, lightweight encoder-decoder that relies on cost volumes computed from pairs of images. We extend it by placing a ConvLSTM cell at the bottleneck layer, which compresses an arbitrary amount of past information in its states. The novelty lies in propagating the hidden state of the cell by accounting for the viewpoint changes between time steps. At a given time step, we warp the previous hidden state into the current camera plane using the previous depth prediction. Our extension brings only a small overhead of computation time and memory consumption, while improving the depth predictions significantly. As a result, we outperform the existing state-of-the-art multi-view stereo methods on most of the evaluated metrics in hundreds of indoor scenes while maintaining a real-time performance. Code available: https://github.com/ardaduz/deep-video-mvs

引用

页码：15319 / 15328

页数：10

共 50 条

[41] ST-COVID: a Deep Multi-View Spatio-temporal Model for COVID-19 Forecasting
Ju, Chang
Wang, Jingping
Zhang, Yingjun
Yin, Hui
Huang, Hua
Xu, Hongli
[J]. 2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 769 - 776
[42] Multi-Codec Video Quality Enhancement Model Based on Spatio-Temporal Deformable Fusion
Kreisler, Gilberto
da Silveira Junior, Garibaldi
Zatt, Bruno
Palomino, Daniel
Correa, Guilherme
[J]. 15TH IEEE LATIN AMERICAN SYMPOSIUM ON CIRCUITS AND SYSTEMS, LASCAS 2024, 2024, : 163 - 167
[43] View Synthesised Prediction with Temporal Texture Synthesis for Multi-View Video
Rahaman, D. M. Motiur
Paul, Manoranjan
[J]. 2016 INTERNATIONAL CONFERENCE ON DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2016, : 729 - 736
[44] A Spatio-Temporal-Contour-consistency-based shape tracking framework from multi-view video
Huang, Chu-Hua
Qin, Jin
Li, Zhi
[J]. Journal of Computers (Taiwan), 2019, 30 (03): : 102 - 116
[45] Spatio-temporal Sampling for Video
Shankar, Mohan
Pitsiauis, Nikos P.
Brady, David
[J]. IMAGE RECONSTRUCTION FROM INCOMPLETE DATA V, 2008, 7076
[46] A distributed compress sensing codec model in multi-view stereo video
School of Electronic Information Engineering, Tianjin University, Tianjin 300072, China
不详
[J]. Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban), 2012, 10 (895-902):
[47] Spatio-Temporal Stereo Using Multi-Resolution Subdivision Surfaces
Jan Neumann
Yiannis Aloimonos
[J]. International Journal of Computer Vision, 2002, 47 : 181 - 193
[48] Video-based, real-time multi-view stereo
Vogiatzis, George
Hernandez, Carlos
[J]. IMAGE AND VISION COMPUTING, 2011, 29 (07) : 434 - 441
[49] From 2D-to stereo- to multi-view video
Knorr, Sebastian
Smolic, AlJoscha
Sikora, Thomas
[J]. 2007 3DTV CONFERENCE, 2007, : 229 - +
[50] Spatio-temporal stereo using multi-resolution subdivision surfaces
Neumann, J
Aloimonos, Y
[J]. IEEE WORKSHOP ON STEREO AND MULTI-BASELINE VISION, PROCEEDINGS, 2001, : 103 - 108

← 1 2 3 4 5 →