Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection

被引:1
|
作者
Ding, Rui [1 ]
Yang, Meng [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
基金
美国国家科学基金会;
关键词
Three-dimensional displays; Laser radar; Uncertainty; Object detection; Feature extraction; Estimation; Knowledge engineering; 3D object detection; depth estimation; cross-modality; knowledge distillation; selective transfer;
D O I
10.1109/TCSVT.2024.3405992
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Monocular 3D object detection is a promising yet ill-posed task for autonomous vehicles due to the lack of accurate depth information. Cross-modality knowledge distillation could effectively transfer depth information from LiDAR to image-based network. However, modality gap between image and LiDAR seriously limits its accuracy. In this paper, we systematically investigate the negative transfer problem induced by modality gap in cross-modality distillation for the first time, including not only the architecture inconsistency issue but more importantly the feature overfitting issue. We propose a selective learning approach named MonoSTL to overcome these issues, which encourages positive transfer of depth information from LiDAR while alleviates the negative transfer on image-based network. On the one hand, we utilize similar architectures to ensure spatial alignment of features between image-based and LiDAR-based networks. On the other hand, we develop two novel distillation modules, namely Depth-Aware Selective Feature Distillation (DASFD) and Depth-Aware Selective Relation Distillation (DASRD), which selectively learn positive features and relationships of objects by integrating depth uncertainty into feature and relation distillations, respectively. Our approach can be seamlessly integrated into various CNN-based and DETR-based models, where we take three recent models on KITTI and a recent model on NuScenes for validation. Extensive experiments show that our approach considerably improves the accuracy of the base models and thereby achieves the best accuracy compared with all recently released SOTA models. The code is released on https://github.com/DingCodeLab/MonoSTL.
引用
收藏
页码:9925 / 9938
页数:14
相关论文
共 50 条
  • [31] Asymmetric cross-modality interaction network for RGB-D salient object detection
    Su, Yiming
    Gao, Haoran
    Wang, Mengyin
    Wang, Fasheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [32] Monocular 3D Object Detection for Autonomous Driving
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhang, Ziyu
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2147 - 2156
  • [33] Dimension Embeddings for Monocular 3D Object Detection
    Zhang, Yunpeng
    Zheng, Wenzhao
    Zhu, Zheng
    Huang, Guan
    Du, Dalong
    Zhou, Jie
    Lu, Jiwen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1579 - 1588
  • [34] Learning Discriminative Cross-Modality Features for RGB-D Saliency Detection
    Wang, Fengyun
    Pan, Jinshan
    Xu, Shoukun
    Tang, Jinhui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 1285 - 1297
  • [35] Uncertainty Prediction for Monocular 3D Object Detection
    Mun, Junghwan
    Choi, Hyukdoo
    SENSORS, 2023, 23 (12)
  • [36] Multivariate Probabilistic Monocular 3D Object Detection
    Shi, Xuepeng
    Chen, Zhixiang
    Kim, Tae-Kyun
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4270 - 4279
  • [37] Homography Loss for Monocular 3D Object Detection
    Gu, Jiaqi
    Wu, Bojian
    Fan, Lubin
    Huang, Jianqiang
    Cao, Shen
    Xiang, Zhiyu
    Hua, Xian-Sheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1070 - 1079
  • [38] Monocular 3D object detection for distant objects
    Li, Jiahao
    Han, Xiaohong
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33021
  • [39] Cross-Domain and Cross-Modality Transfer Learning for Multi-domain and Multi-modality Event Detection
    Yang, Zhenguo
    Cheng, Min
    Li, Qing
    Li, Yukun
    Lin, Zehang
    Liu, Wenyin
    WEB INFORMATION SYSTEMS ENGINEERING, WISE 2017, PT I, 2017, 10569 : 516 - 523
  • [40] Multi-Scale Enhanced Depth Knowledge Distillation for Monocular 3D Object Detection with SEFormer
    Zhang, Han
    Li, Jun
    Tang, Rui
    Shi, Zhiping
    Bu, Aojie
    2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 38 - 43