eViTBins: Edge-Enhanced Vision-Transformer Bins for Monocular Depth Estimation on Edge Devices

被引:0
|
作者
She, Yutong [1 ]
Li, Peng [1 ]
Wei, Mingqiang [1 ]
Liang, Dong [1 ]
Chen, Yiping [2 ]
Xie, Haoran [3 ]
Wang, Fu Lee [4 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 211106, Peoples R China
[2] Sun Yat Sen Univ, Sch Geospatial Engn & Sci, Zhuhai 519082, Peoples R China
[3] Lingnan Univ, Sch Data Sci, Hong Kong, Peoples R China
[4] Hong Kong Metropolitan Univ, Sch Sci & Technol, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Edge-enhanced vision transformer; adaptive depth bins; monocular depth estimation; edge AI; unmanned aerial vehicle; traffic monitoring;
D O I
10.1109/TITS.2024.3480114
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Monocular depth estimation (MDE) remains a fundamental yet not well-solved problem in computer vision. Current wisdom of MDE often achieves blurred or even indistinct depth boundaries, degenerating the quality of vision-based intelligent transportation systems. This paper presents an edge-enhanced vision transformer bins network for monocular depth estimation, termed eViTBins. eViTBins has three core modules to predict monocular depth maps with exceptional smoothness, accuracy, and fidelity to scene structures and object edges. First, a multi-scale feature fusion module is proposed to circumvent the loss of depth information at various levels during depth regression. Second, an image-guided edge-enhancement module is proposed to accurately infer depth values around image boundaries. Third, a vision transformer-based depth discretization module is introduced to comprehend the global depth distribution. Meanwhile, unlike most MDE models that rely on high-performance GPUs, eViTBins is optimized for seamless deployment on edge devices, such as NVIDIA Jetson Nano and Google Coral SBC, making it ideal for real-time intelligent transportation systems applications. Extensive experimental evaluations corroborate the superiority of eViTBins over competing methods, notably in terms of preserving depth edges and global depth representations.
引用
收藏
页码:20320 / 20334
页数:15
相关论文
共 50 条
  • [41] ATTENTION-BASED SELF-SUPERVISED LEARNING MONOCULAR DEPTH ESTIMATION WITH EDGE REFINEMENT
    Jiang, Chenweinan
    Liu, Haichun
    Li, Lanzhen
    Pan, Changchun
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3218 - 3222
  • [42] ADAPTIVE WEIGHTED NETWORK WITH EDGE ENHANCEMENT MODULE FOR MONOCULAR SELF-SUPERVISED DEPTH ESTIMATION
    Liu, Hong
    Zhu, Ying
    Hua, Guoliang
    Huang, Weibo
    Ding, Runwei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2340 - 2344
  • [43] Optimize Vision Transformer Architecture via Efficient Attention Modules: A Study on the Monocular Depth Estimation Task
    Schiavella, Claudio
    Cirillo, Lorenzo
    Papa, Lorenzo
    Russo, Paolo
    Amerini, Irene
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2023 WORKSHOPS, PT I, 2024, 14365 : 383 - 394
  • [44] An objective sharpness evaluation method for edge-enhanced digital halftone images using cooperative human vision model
    Matsui, Toshikazu
    Shioda, Hayato
    IDW '06: PROCEEDINGS OF THE 13TH INTERNATIONAL DISPLAY WORKSHOPS, VOLS 1-3, 2006, : 515 - 518
  • [45] EDFIDepth: enriched multi-path vision transformer feature interaction networks for monocular depth estimation
    Xia, Chenxing
    Zhang, Mengge
    Gao, Xiuju
    Ge, Bin
    Li, Kuan-Ching
    Fang, Xianjin
    Zhang, Yan
    Liang, Xingzhu
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (14): : 21023 - 21047
  • [46] Edge-Enhanced Heterogeneous Graph Transformer With Priority-Based Feature Aggregation for Multi-Agent Trajectory Prediction
    Zhou, Xiangzheng
    Chen, Xiaobo
    Yang, Jian
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (02) : 2266 - 2281
  • [47] Attention Mono-Depth: Attention-Enhanced Transformer for Monocular Depth Estimation of Volatile Kiln Burden Surface
    Liu, Cong
    Zhang, Chaobo
    Liang, Xiaojun
    Han, Zhiming
    Li, Yiming
    Yang, Chunhua
    Gui, Weihua
    Gao, Wen
    Wang, Xiaohao
    Li, Xinghui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1686 - 1699
  • [48] Retraining-free Constraint-aware Token Pruning for Vision Transformer on Edge Devices
    Yu, Yun-Chia
    Weng, Mao-Chi
    Lin, Ming-Guang
    Wu, An-Yeu
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [49] Optimization of Stereo Vision Depth Estimation using Edge-Based Disparity Map
    Du, Juan
    Okae, James
    2017 10TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ELECO), 2017, : 1171 - 1175
  • [50] FasterMDE: A real-time monocular depth estimation search method that balances accuracy and speed on the edge
    ZiWen, Dou
    YuQi, Li
    Dong, Ye
    APPLIED INTELLIGENCE, 2023, 53 (20) : 24566 - 24586