Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection

被引:1
|
作者
Ding, Rui [1 ]
Yang, Meng [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
基金
美国国家科学基金会;
关键词
Three-dimensional displays; Laser radar; Uncertainty; Object detection; Feature extraction; Estimation; Knowledge engineering; 3D object detection; depth estimation; cross-modality; knowledge distillation; selective transfer;
D O I
10.1109/TCSVT.2024.3405992
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Monocular 3D object detection is a promising yet ill-posed task for autonomous vehicles due to the lack of accurate depth information. Cross-modality knowledge distillation could effectively transfer depth information from LiDAR to image-based network. However, modality gap between image and LiDAR seriously limits its accuracy. In this paper, we systematically investigate the negative transfer problem induced by modality gap in cross-modality distillation for the first time, including not only the architecture inconsistency issue but more importantly the feature overfitting issue. We propose a selective learning approach named MonoSTL to overcome these issues, which encourages positive transfer of depth information from LiDAR while alleviates the negative transfer on image-based network. On the one hand, we utilize similar architectures to ensure spatial alignment of features between image-based and LiDAR-based networks. On the other hand, we develop two novel distillation modules, namely Depth-Aware Selective Feature Distillation (DASFD) and Depth-Aware Selective Relation Distillation (DASRD), which selectively learn positive features and relationships of objects by integrating depth uncertainty into feature and relation distillations, respectively. Our approach can be seamlessly integrated into various CNN-based and DETR-based models, where we take three recent models on KITTI and a recent model on NuScenes for validation. Extensive experiments show that our approach considerably improves the accuracy of the base models and thereby achieves the best accuracy compared with all recently released SOTA models. The code is released on https://github.com/DingCodeLab/MonoSTL.
引用
收藏
页码:9925 / 9938
页数:14
相关论文
共 50 条
  • [1] Cross-Modality Knowledge Distillation Network for Monocular 3D Object Detection
    Hong, Yu
    Dai, Hang
    Ding, Yong
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 87 - 104
  • [2] Cross-Modality 3D Object Detection
    Zhu, Ming
    Ma, Chao
    Ji, Pan
    Yang, Xiaokang
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3771 - 3780
  • [3] Cascaded Cross-Modality Fusion Network for 3D Object Detection
    Chen, Zhiyu
    Lin, Qiong
    Sun, Jing
    Feng, Yujian
    Liu, Shangdong
    Liu, Qiang
    Ji, Yimu
    Xu, He
    SENSORS, 2020, 20 (24) : 1 - 14
  • [4] UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
    Zhou, Shengchao
    Liu, Weizhou
    Hu, Chen
    Zhou, Shuchang
    Ma, Chao
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5116 - 5125
  • [5] Monocular 3D Object Detection With Motion Feature Distillation
    Hu, Henan
    Li, Muyu
    Zhu, Ming
    Gao, Wen
    Liu, Peiyu
    Chan, Kwok-Leung
    IEEE ACCESS, 2023, 11 : 82933 - 82945
  • [6] Unconstrained Monocular 3D Human Pose Estimation by Action Detection and Cross-modality Regression Forest
    Yu, Tsz-Ho
    Kim, Tae-Kyun
    Cipolla, Roberto
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 3642 - 3649
  • [7] A Two-Phase Cross-Modality Fusion Network for Robust 3D Object Detection
    Jiao, Yujun
    Yin, Zhishuai
    SENSORS, 2020, 20 (21) : 1 - 14
  • [8] Learning Occupancy for Monocular 3D Object Detection
    Peng, Liang
    Xu, Junkai
    Cheng, Haoran
    Yang, Zheng
    Wu, Xiaopei
    Qian, Wei
    Wang, Wenxiao
    Wu, Boxi
    Cai, Deng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 10281 - 10292
  • [9] Task-Decoupled Knowledge Transfer for Cross-Modality Object Detection
    Wei, Chiheng
    Bai, Lianfa
    Chen, Xiaoyu
    Han, Jing
    ENTROPY, 2023, 25 (08)
  • [10] Dynamic Knowledge Distillation with Cross-Modality Knowledge Transfer
    Wang, Guangzhi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2974 - 2978