Encoder-Decoder Structure with Multiscale Receptive Field Block for Unsupervised Depth Estimation from Monocular Video

被引:1
|
作者
Chen, Songnan [1 ]
Han, Junyu [2 ]
Tang, Mengxia [2 ]
Dong, Ruifang [2 ]
Kan, Jiangming [2 ]
机构
[1] Wuhan Polytech Univ, Sch Math & Comp Sci, 36 Huanhu Middle Rd, Wuhan 430048, Peoples R China
[2] Beijing Forestry Univ, Sch Technol, 35 Qinghua East Rd, Beijing 100083, Peoples R China
基金
中国国家自然科学基金;
关键词
monocular depth estimation; unsupervised learning methods; structure from motion; confidence mask; COST AGGREGATION;
D O I
10.3390/rs14122906
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Monocular depth estimation is a fundamental yet challenging task in computer vision as depth information will be lost when 3D scenes are mapped to 2D images. Although deep learning-based methods have led to considerable improvements for this task in a single image, most existing approaches still fail to overcome this limitation. Supervised learning methods model depth estimation as a regression problem and, as a result, require large amounts of ground truth depth data for training in actual scenarios. Unsupervised learning methods treat depth estimation as the synthesis of a new disparity map, which means that rectified stereo image pairs need to be used as the training dataset. Aiming to solve such problem, we present an encoder-decoder based framework, which infers depth maps from monocular video snippets in an unsupervised manner. First, we design an unsupervised learning scheme for the monocular depth estimation task based on the basic principles of structure from motion (SfM) and it only uses adjacent video clips rather than paired training data as supervision. Second, our method predicts two confidence masks to improve the robustness of the depth estimation model to avoid the occlusion problem. Finally, we leverage the largest scale and minimum depth loss instead of the multiscale and average loss to improve the accuracy of depth estimation. The experimental results on the benchmark KITTI dataset for depth estimation show that our method outperforms competing unsupervised methods.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Encoder-decoder with densely convolutional networks for monocular depth estimation
    Chen, Songnan
    Tang, Mengxia
    Kan, Jiangming
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2019, 36 (10) : 1709 - 1718
  • [2] A Dual Encoder-Decoder Network for Self-Supervised Monocular Depth Estimation
    Zheng, Mingkui
    Luo, Lin
    Zheng, Haifeng
    Ye, Zhangfan
    Su, Zhe
    [J]. IEEE SENSORS JOURNAL, 2023, 23 (17) : 19747 - 19756
  • [3] Encoder-Decoder Structure With the Feature Pyramid for Depth Estimation From a Single Image
    Tang, Mengxia
    Chen, Songnan
    Dong, Ruifang
    Kan, Jiangming
    [J]. IEEE ACCESS, 2021, 9 : 22640 - 22650
  • [4] Unsupervised Monocular Depth Estimation From Light Field Image
    Zhou, Wenhui
    Zhou, Enci
    Liu, Gaomin
    Lin, Lili
    Lumsdaine, Andrew
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1606 - 1617
  • [5] Unsupervised Depth Estimation from Monocular Video based on Relative Motion
    Cao, Hui
    Wang, Chao
    Wang, Ping
    Zou, Qingquan
    Xiao, Xiao
    [J]. 2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MACHINE LEARNING (SPML 2018), 2018, : 159 - 165
  • [6] Encoder-Decoder Structure Fusing Depth Information for Outdoor Semantic Segmentation
    Chen, Songnan
    Tang, Mengxia
    Dong, Ruifang
    Kan, Jiangming
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (17):
  • [7] MED-VT: Multiscale Encoder-Decoder Video Transformer with Application to Object Segmentation
    Karim, Rezaul
    Zhao, He
    Wildes, Richard P.
    Siam, Mennatullah
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6323 - 6333
  • [8] SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
    Papa, Lorenzo
    Alati, Edoardo
    Russo, Paolo
    Amerini, Irene
    [J]. IEEE Access, 2022, 10 : 44881 - 44890
  • [9] SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
    Papa, Lorenzo
    Alati, Edoardo
    Russo, Paolo
    Amerini, Irene
    [J]. IEEE ACCESS, 2022, 10 : 44881 - 44890
  • [10] Rethinking the Encoder-decoder Structure in Medical Image Segmentation from Releasing Decoder Structure
    Ni, Jiajia
    Mu, Wei
    Pan, An
    Chen, Zhengming
    [J]. JOURNAL OF BIONIC ENGINEERING, 2024, 21 (03) : 1511 - 1521