Encoder-Decoder Structure with Multiscale Receptive Field Block for Unsupervised Depth Estimation from Monocular Video

被引：1

作者：

Chen, Songnan ^{[1
]}

Han, Junyu ^{[2
]}

Tang, Mengxia ^{[2
]}

Dong, Ruifang ^{[2
]}

Kan, Jiangming ^{[2
]}

机构：

[1] Wuhan Polytech Univ, Sch Math & Comp Sci, 36 Huanhu Middle Rd, Wuhan 430048, Peoples R China

[2] Beijing Forestry Univ, Sch Technol, 35 Qinghua East Rd, Beijing 100083, Peoples R China

来源：

REMOTE SENSING | 2022年 / 14卷 / 12期

基金：

中国国家自然科学基金;

关键词：

monocular depth estimation; unsupervised learning methods; structure from motion; confidence mask; COST AGGREGATION;

D O I：

10.3390/rs14122906

中图分类号：

X [环境科学、安全科学];

学科分类号：

08 ; 0830 ;

摘要：

Monocular depth estimation is a fundamental yet challenging task in computer vision as depth information will be lost when 3D scenes are mapped to 2D images. Although deep learning-based methods have led to considerable improvements for this task in a single image, most existing approaches still fail to overcome this limitation. Supervised learning methods model depth estimation as a regression problem and, as a result, require large amounts of ground truth depth data for training in actual scenarios. Unsupervised learning methods treat depth estimation as the synthesis of a new disparity map, which means that rectified stereo image pairs need to be used as the training dataset. Aiming to solve such problem, we present an encoder-decoder based framework, which infers depth maps from monocular video snippets in an unsupervised manner. First, we design an unsupervised learning scheme for the monocular depth estimation task based on the basic principles of structure from motion (SfM) and it only uses adjacent video clips rather than paired training data as supervision. Second, our method predicts two confidence masks to improve the robustness of the depth estimation model to avoid the occlusion problem. Finally, we leverage the largest scale and minimum depth loss instead of the multiscale and average loss to improve the accuracy of depth estimation. The experimental results on the benchmark KITTI dataset for depth estimation show that our method outperforms competing unsupervised methods.

引用

页数：17

共 50 条

[1] Encoder-decoder with densely convolutional networks for monocular depth estimation
Chen, Songnan
Tang, Mengxia
Kan, Jiangming
[J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2019, 36 (10) : 1709 - 1718
[2] A Dual Encoder-Decoder Network for Self-Supervised Monocular Depth Estimation
Zheng, Mingkui
Luo, Lin
Zheng, Haifeng
Ye, Zhangfan
Su, Zhe
[J]. IEEE SENSORS JOURNAL, 2023, 23 (17) : 19747 - 19756
[3] Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
Tang, Mengxia
Chen, Songnan
Dong, Ruifang
Kan, Jiangming
[J]. IEEE ACCESS, 2021, 9 : 22640 - 22650
[4] Unsupervised Monocular Depth Estimation From Light Field Image
Zhou, Wenhui
Zhou, Enci
Liu, Gaomin
Lin, Lili
Lumsdaine, Andrew
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1606 - 1617
[5] Unsupervised Depth Estimation from Monocular Video based on Relative Motion
Cao, Hui
Wang, Chao
Wang, Ping
Zou, Qingquan
Xiao, Xiao
[J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MACHINE LEARNING (SPML 2018), 2018, : 159 - 165
[6] Encoder-Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation
Chen, Songnan
Tang, Mengxia
Dong, Ruifang
Kan, Jiangming
[J]. APPLIED SCIENCES-BASEL, 2023, 13 (17):
[7] MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation
Karim, Rezaul
Zhao, He
Wildes, Richard P.
Siam, Mennatullah
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6323 - 6333
[8] SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
Papa, Lorenzo
Alati, Edoardo
Russo, Paolo
Amerini, Irene
[J]. IEEE Access, 2022, 10 : 44881 - 44890
[9] SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
Papa, Lorenzo
Alati, Edoardo
Russo, Paolo
Amerini, Irene
[J]. IEEE ACCESS, 2022, 10 : 44881 - 44890
[10] Rethinking the Encoder-decoder Structure in Medical Image Segmentation from Releasing Decoder Structure
Ni, Jiajia
Mu, Wei
Pan, An
Chen, Zhengming
[J]. JOURNAL OF BIONIC ENGINEERING, 2024, 21 (03) : 1511 - 1521

← 1 2 3 4 5 →