EMTNet: efficient mobile transformer network for real-time monocular depth estimation

被引:0
|
作者
Long Yan
Fuyang Yu
Chao Dong
机构
[1] Shandong Technology and Business University,School of Management Science and Engineering
[2] Shandong Technology and Business University,School of Information and Electronic Engineering
来源
关键词
Deep learning; Vision transformer; Monocular depth estimation; Real-time task; Attention mechanism;
D O I
暂无
中图分类号
学科分类号
摘要
Estimating depth from a single image presents a formidable challenge due to the inherently ill-posed and ambiguous nature of deriving depth information from a 3D scene. Prior approaches to monocular depth estimation have mainly relied on Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) as the primary feature extraction methods. However, striking a balance between speed and accuracy for real-time tasks has proven to be a formidable hurdle with these methods. In this study, we proposed a new model called EMTNet, which extracts feature information from images at both local and global scales by combining CNN and ViT. To reduce the number of parameters, EMTNet introduces the mobile transformer block (MTB), which reuses parameters from self-attention. High-resolution depth maps are generated by fusing multi-scale features in the decoder. Through comprehensive validation on the NYU Depth V2 and KITTI datasets, the results demonstrate that EMTNet outperforms previous real-time monocular depth estimation models based on CNNs and hybrid architecture. In addition, we have done the corresponding generalizability tests and ablation experiments to verify our conjectures. The depth map output from EMTNet exhibits intricate details and attains a real-time frame rate of 32 FPS, achieving a harmonious balance between real-time and accuracy.
引用
收藏
页码:1833 / 1846
页数:13
相关论文
共 50 条
  • [1] EMTNet: efficient mobile transformer network for real-time monocular depth estimation
    Yan, Long
    Yu, Fuyang
    Dong, Chao
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (04) : 1833 - 1846
  • [2] Towards Real-Time Monocular Depth Estimation For Mobile Systems
    Deldjoo, Yashar
    Di Noia, Tommaso
    Di Sciascio, Eugenio
    Pernisco, Gaetano
    Reno, Vito
    Stella, Ettore
    [J]. MULTIMODAL SENSING AND ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS II, 2021, 11785
  • [3] Real-time Monocular Depth Estimation with Sparse Supervision on Mobile
    Yucel, Mehmet Kerim
    Dimaridou, Valia
    Drosou, Anastasios
    Saa-Garriga, Albert
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 2428 - 2437
  • [4] OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network
    Wei, Feng
    Yin, XingHui
    Shen, Jie
    Wang, HuiBin
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2023, 128 (04) : 2831 - 2846
  • [5] OptiDepthNet: A Real-Time Unsupervised Monocular Depth Estimation Network
    Feng Wei
    XingHui Yin
    Jie Shen
    HuiBin Wang
    [J]. Wireless Personal Communications, 2023, 128 : 2831 - 2846
  • [6] Real-Time and Accurate Self-Supervised Monocular Depth Estimation on Mobile Device
    Cai, Hong
    Yin, Fei
    Singhal, Tushar
    Pendyam, Sandeep
    Noorzad, Parham
    Zhu, Yinhao
    Nguyen, Khoi
    Matai, Janarbek
    Ramaswamy, Bharath
    Mayer, Frank
    Patel, Chirag
    Khobare, Abhijit
    Porikli, Fatih
    [J]. NEURIPS 2021 COMPETITIONS AND DEMONSTRATIONS TRACK, VOL 176, 2021, 176 : 308 - 313
  • [7] Real-time Monocular Depth Estimation with Extremely Light-Weight Neural Network
    Chiu, Mian-Jhong
    Chiu, Wei-Chen
    Chen, Hua-Tsung
    Chuang, Jen-Hui
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7050 - 7057
  • [8] Real-Time Depth Estimation from a Monocular Moving Camera
    Handa, Aniket
    Sharma, Prateek
    [J]. CONTEMPORARY COMPUTING, 2012, 306 : 494 - 495
  • [9] Towards real-time unsupervised monocular depth estimation on CPU
    Poggi, Matteo
    Aleotti, Filippo
    Tosi, Fabio
    Mattoccia, Stefano
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 5848 - 5854
  • [10] Real-time monocular depth estimation with adaptive receptive fields
    Ji, Zhenyan
    Song, Xiaojun
    Guo, Xiaoxuan
    Wang, Fangshi
    Armendariz-Inigo, Jose Enrique
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (04) : 1369 - 1381