Monocular 3D Object Detection With Motion Feature Distillation

被引:2
|
作者
Hu, Henan [1 ,2 ]
Li, Muyu [3 ]
Zhu, Ming [1 ]
Gao, Wen [4 ]
Liu, Peiyu [5 ]
Chan, Kwok-Leung [6 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Ctr Intelligent Multidimens Data Anal Ltd, Hong Kong, Peoples R China
[4] BYD Auto Ind Co Ltd, Shenzhen 518119, Peoples R China
[5] Shenyang Aircraft Design & Res Inst, Shenyang 110036, Liaoning, Peoples R China
[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Object detection; Feature extraction; Estimation; Location awareness; Image resolution; Solid modeling; 3D object detection; bird's-eye-view (BEV); monocular depth estimation; motion feature; knowledge distillation; autonomous driving;
D O I
10.1109/ACCESS.2023.3300708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird's-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.
引用
收藏
页码:82933 / 82945
页数:13
相关论文
共 50 条
  • [21] Progressive Coordinate Transforms for Monocular 3D Object Detection
    Wang, Li
    Zhang, Li
    Zhu, Yi
    Zhang, Zhi
    He, Tong
    Li, Mu
    Xue, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Exploring Geometric Consistency for Monocular 3D Object Detection
    Lian, Qing
    Ye, Botao
    Xu, Ruijia
    Yao, Weilong
    Zhang, Tong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1675 - 1684
  • [23] MonoSG: Monocular 3D Object Detection With Stereo Guidance
    Fan, Zhiwei
    Xu, Chao
    Chu, Minghang
    Huang, Yuling
    Ma, Yaoyao
    Wang, Jing
    Xu, Yishen
    Wu, Di
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (04): : 3604 - 3611
  • [24] Monocular Object Detection Using 3D Geometric Primitives
    Carr, Peter
    Sheikh, Yaser
    Matthews, Iain
    COMPUTER VISION - ECCV 2012, PT I, 2012, 7572 : 864 - 878
  • [25] Monocular 3D Object Detection from Roadside Infrastructure
    Huang, Delu
    Wen, Feng
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1672 - 1677
  • [26] Dense-JANet for Monocular 3D Object Detection
    Shang, Xiaoqing
    Cheng, Zhiwei
    Shi, Su
    Cheng, Zhuanghao
    Huang, Hongcheng
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [27] Monocular 3D object detection for an indoor robot environment
    Kim, Jiwon
    Lee, GiJae
    Kim, Jun-Sik
    Kim, Hyunwoo J.
    Kim, KangGeon
    2020 29TH IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2020, : 438 - 445
  • [28] MonoCD: Monocular 3D Object Detection with Complementary Depths
    Yan, Longfei
    Yan, Pei
    Xiong, Shengzhou
    Xiang, Xuanyu
    Tan, Yihua
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10248 - 10257
  • [29] A New Monocular 3D Object Detection with Neural Network
    Hong, Weijie
    Liu, Yiguang
    Zheng, Yunan
    Wang, Ying
    Shi, Xuelei
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT IV, 2018, 11259 : 174 - 185
  • [30] 3D Visual Object Detection from Monocular Images
    Wang, Qiaosong
    Rasmussen, Christopher
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 168 - 180