eViTBins: Edge-Enhanced Vision-Transformer Bins for Monocular Depth Estimation on Edge Devices

被引:0
|
作者
She, Yutong [1 ]
Li, Peng [1 ]
Wei, Mingqiang [1 ]
Liang, Dong [1 ]
Chen, Yiping [2 ]
Xie, Haoran [3 ]
Wang, Fu Lee [4 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Sun Yat Sen Univ, Sch Geospatial Engn & Sci, Zhuhai 519082, Peoples R China
[3] Lingnan Univ, Sch Data Sci, Hong Kong, Peoples R China
[4] Hong Kong Metropolitan Univ, Sch Sci & Technol, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Edge-enhanced vision transformer; adaptive depth bins; monocular depth estimation; edge AI; unmanned aerial vehicle; traffic monitoring;
D O I
10.1109/TITS.2024.3480114
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Monocular depth estimation (MDE) remains a fundamental yet not well-solved problem in computer vision. Current wisdom of MDE often achieves blurred or even indistinct depth boundaries, degenerating the quality of vision-based intelligent transportation systems. This paper presents an edge-enhanced vision transformer bins network for monocular depth estimation, termed eViTBins. eViTBins has three core modules to predict monocular depth maps with exceptional smoothness, accuracy, and fidelity to scene structures and object edges. First, a multi-scale feature fusion module is proposed to circumvent the loss of depth information at various levels during depth regression. Second, an image-guided edge-enhancement module is proposed to accurately infer depth values around image boundaries. Third, a vision transformer-based depth discretization module is introduced to comprehend the global depth distribution. Meanwhile, unlike most MDE models that rely on high-performance GPUs, eViTBins is optimized for seamless deployment on edge devices, such as NVIDIA Jetson Nano and Google Coral SBC, making it ideal for real-time intelligent transportation systems applications. Extensive experimental evaluations corroborate the superiority of eViTBins over competing methods, notably in terms of preserving depth edges and global depth representations.
引用
收藏
页码:20320 / 20334
页数:15
相关论文
共 50 条
  • [21] Edge-Aware Monocular Dense Depth Estimation with Morphology
    Li, Zhi
    Zhu, Xiaoyang
    Yu, Haitao
    Zhang, Qi
    Jiang, Yongshi
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2935 - 2942
  • [22] Sparse Transformer-based bins and Polarized Cross Attention decoder for monocular depth estimation
    Wang, Hai-Kun
    Du, Jiahui
    Song, Ke
    Cui, Limin
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2024, 54
  • [23] MonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer
    Zhao, Chaoqiang
    Zhang, Youmin
    Poggi, Matteo
    Tosi, Fabio
    Guo, Xianda
    Zhu, Zheng
    Huang, Guan
    Tang, Yang
    Mattoccia, Stefano
    2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, : 668 - 678
  • [24] Residual Vision Transformer and Adaptive Fusion Autoencoders for Monocular Depth Estimation
    Yang, Wei-Jong
    Wu, Chih-Chen
    Yang, Jar-Ferr
    SENSORS, 2025, 25 (01)
  • [25] Edge-Enhanced Matching for Gradient-Based Computer Vision Displacement Measurement
    Luo, Longxi
    Feng, Maria Q.
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2018, 33 (12) : 1019 - 1040
  • [26] Edge-enhanced infrared image super-resolution reconstruction model under transformer
    Hu, Lei
    Hu, Long
    Chen, Minghui
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [27] Asymmetric Edge-Aware Transformers for Monocular Endoscopic Depth Estimation
    Wu, Ming
    Qi, Hao
    Fan, Wenkang
    Ke, Sunkui
    Zeng, Hui-Qing
    Chen, Yinran
    Luo, Xiongbiao
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [28] A lightweight network for monocular depth estimation with decoupled body and edge supervision
    Ali, Usman
    Bayramli, Bayram
    Alsarhan, Tamam
    Lu, Hongtao
    IMAGE AND VISION COMPUTING, 2021, 113
  • [29] Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
    Talker, Lior
    Cohen, Aviad
    Yosef, Erez
    Dana, Alexandra
    Dinerstein, Michael
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10606 - 10616
  • [30] MonoER - A Edge Refined Self-Supervised Monocular Depth Estimation Method
    Xiang, Tianyu
    Zhao, Lingzhe
    Zhang, Hao
    Wang, Zhuping
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1074 - 1079