Selective Transfer Learning of Cross-Modality Distillation for Monocular 3D Object Detection

被引:1
|
作者
Ding, Rui [1 ]
Yang, Meng [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Peoples R China
基金
美国国家科学基金会;
关键词
Three-dimensional displays; Laser radar; Uncertainty; Object detection; Feature extraction; Estimation; Knowledge engineering; 3D object detection; depth estimation; cross-modality; knowledge distillation; selective transfer;
D O I
10.1109/TCSVT.2024.3405992
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Monocular 3D object detection is a promising yet ill-posed task for autonomous vehicles due to the lack of accurate depth information. Cross-modality knowledge distillation could effectively transfer depth information from LiDAR to image-based network. However, modality gap between image and LiDAR seriously limits its accuracy. In this paper, we systematically investigate the negative transfer problem induced by modality gap in cross-modality distillation for the first time, including not only the architecture inconsistency issue but more importantly the feature overfitting issue. We propose a selective learning approach named MonoSTL to overcome these issues, which encourages positive transfer of depth information from LiDAR while alleviates the negative transfer on image-based network. On the one hand, we utilize similar architectures to ensure spatial alignment of features between image-based and LiDAR-based networks. On the other hand, we develop two novel distillation modules, namely Depth-Aware Selective Feature Distillation (DASFD) and Depth-Aware Selective Relation Distillation (DASRD), which selectively learn positive features and relationships of objects by integrating depth uncertainty into feature and relation distillations, respectively. Our approach can be seamlessly integrated into various CNN-based and DETR-based models, where we take three recent models on KITTI and a recent model on NuScenes for validation. Extensive experiments show that our approach considerably improves the accuracy of the base models and thereby achieves the best accuracy compared with all recently released SOTA models. The code is released on https://github.com/DingCodeLab/MonoSTL.
引用
收藏
页码:9925 / 9938
页数:14
相关论文
共 50 条
  • [21] Monocular 3D Object Detection From Comprehensive Feature Distillation Pseudo-LiDAR
    Sun, Chentao
    Xu, Chengrui
    Fang, Wenxiao
    Xu, Kunyuan
    IEEE ACCESS, 2023, 11 : 98969 - 98976
  • [22] CROSS-MODALITY MEDICAL IMAGE DETECTION AND SEGMENTATION BY TRANSFER LEARNING OF SHAPE PRIORS
    Zheng, Yefeng
    2015 IEEE 12TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2015, : 424 - 427
  • [23] Addressing imaging accessibility by cross-modality transfer learning
    Zheng, Zhiyang
    Su, Yi
    Chen, Kewei
    Weidman, David A.
    Wu, Teresa
    Lo, Ben
    Lure, Fleming
    Li, Jing
    MEDICAL IMAGING 2022: COMPUTER-AIDED DIAGNOSIS, 2022, 12033
  • [24] Triangulation Learning Network: from Monocular to Stereo 3D Object Detection
    Qin, Zengyi
    Wang, Jinglu
    Lu, Yan
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7607 - 7615
  • [25] A Survey on Monocular 3D Object Detection Algorithms Based on Deep Learning
    Wu, Junhui
    Yin, Dong
    Chen, Jie
    Wu, Yusheng
    Si, Huiping
    Lin, Kaiyan
    2020 4TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2020), 2020, 1518
  • [26] Depth-discriminative Metric Learning for Monocular 3D Object Detection
    Choi, Wonhyeok
    Shin, Mingyu
    Im, Sunghoon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [27] DCMNet: Discriminant and cross-modality network for RGB-D salient object detection
    Wang, Fasheng
    Wang, Ruimin
    Sun, Fuming
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 214
  • [28] Learning Depth-Guided Convolutions for Monocular 3D Object Detection
    Ng, Mingyu
    Huo, Yuqi
    Yi, Hongwei
    Wang, Zhe
    Shi, Jianping
    Lu, Zhiwu
    Luo, Ping
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4306 - 4315
  • [29] Monocular 3D Object Detection Utilizing Auxiliary Learning With Deformable Convolution
    Chen, Jiun-Han
    Shieh, Jeng-Lun
    Haq, Muhamad Amirul
    Ruan, Shanq-Jang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (03) : 2424 - 2436
  • [30] Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection
    Zhang, Chen
    Cong, Runmin
    Lin, Qinwei
    Ma, Lin
    Li, Feng
    Zhao, Yao
    Kwong, Sam
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2094 - 2102