Monocular Image Depth Estimation Based on Multi-Scale Attention Oriented Network

被引:0
|
作者
Liu J. [1 ]
Wen J. [1 ]
Liang Y. [1 ]
机构
[1] School of Electronic and Information Engineering, South China University of Technology, Guangzhou
来源
Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science) | 2020年 / 48卷 / 12期
基金
中国国家自然科学基金;
关键词
Channel attention fusion; Deep learning; Monocular image depth estimation; Multi-scale attention-oriented network; Multi-scale feature;
D O I
10.12141/j.issn.1000-565X.200083
中图分类号
学科分类号
摘要
Aiming at the problems of low spatial resolution and unclear edges in the existing depth estimation algorithms of monocular images based on deep learning, a depth estimation algorithm of monocular images based on multi-scale attention-oriented network was put forward. Firstly, an end-to-end encoder-decoder model was designed, and the encoder extracts features at multiple scales. To ensure better depth continuity, the decoder gradua-lly optimize details and scene structure of extracted multi-scale features by combining residual learning with channel attention fusion. Considering the loss of depth details caused by multiple down-sampling, a boundary enhancement module was designed. By introducing spatial attention, the inter-class contrast of different objects was improved to enhance the boundary details of the image. Finally, the optimization module fuses multi-scale features from the decoder and the boundary enhancement module to generate a depth image. Experimental results show that, compared with the current mainstream algorithms, the depth image generated by the algorithm has improved quality, showing more detailed object contour information and good performance in both objective indicators and subjective effects. © 2020, Editorial Department, Journal of South China University of Technology. All right reserved.
引用
收藏
页码:52 / 62
页数:10
相关论文
共 19 条
  • [1] ZENG A, SONG S, NIEbetaNER M, Et al., 3DMatch: Learning local geometric descriptors from RGB-D reconstructions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1802-1811, (2017)
  • [2] TATENO K, TOMBARI F, LAINA I, Et al., CNN-SLAM: Real-time dense monocular slam with learned depth prediction, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6243-6252, (2017)
  • [3] FAN Xiaoting, LI Yi, LUO Xiaowei, Et al., Depth estimation based on light field structure characteristic and multiview matching, Infrared and Laser Enginee-ring, 48, 5, (2019)
  • [4] LIU Jieping, ZHOU Huasheng, YU Langheng, Et al., Depth map generation algorithm based on visual dictionary, Acta Optica Sinica, 38, 9, (2018)
  • [5] BI Tianteng, LIU Yue, WENG Dongdong, Et al., Survey on supervised learning based depth estimation from a single image, Journal of Computer-Aided Design & Computer Graphics, 30, 8, pp. 1383-1393, (2018)
  • [6] LI Yang, CHEN Xiuwan, WANG Yuan, Et al., Progress in deep learning based monocular image depth estimation, Laser & Optoelectronics Progress, 56, 19, (2019)
  • [7] EIGEN D, PUHRSCH C, FERGUS R., Depth map prediction from a single image using a multi-scale deep network, Proceedings of Advances in Neural Information Processing Systems, pp. 2366-2374, (2014)
  • [8] EIGEN D, FERGUS R., Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, Proceedings of the IEEE International Conference on Computer Vision, pp. 2650-2658, (2015)
  • [9] LAINA I, RUPPRECHT C, BELAGIANNIS V, Et al., Deeper depth prediction with fully convolutional residual networks, Proceedings of the Fourth International Conference on 3D Vision(3DV), pp. 239-248, (2016)
  • [10] LIU F Y, SHEN C H, LIN G S, Et al., Learning depth from single monocular images using deep convolutional neural fields, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 10, pp. 2024-2039, (2016)