Multimodal Transformer for Automatic 3D Annotation and Object Detection

被引:7
|
作者
Liu, Chang [1 ]
Qian, Xiaoyan [1 ]
Huang, Binxiao [1 ]
Qi, Xiaojuan [1 ]
Lam, Edmund [1 ]
Tan, Siew-Chong [1 ]
Wong, Ngai [1 ]
机构
[1] Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China
来源
关键词
3d Autolabeler; 3d Object detection; Multimodal vision; Self-attention; Self-supervision; Transformer;
D O I
10.1007/978-3-031-19839-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45% AP on KITTI hard samples. Codes are at https://github.com/Cliu2/MTrans.
引用
收藏
页码:657 / 673
页数:17
相关论文
共 50 条
  • [41] Fusion information enhanced method based on transformer for 3D object detection
    Jin Y.
    Tao C.
    Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (12): : 297 - 306
  • [42] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
    Yan, Junjie
    Liu, Yingfei
    Sun, Jianjian
    Jia, Fan
    Li, Shuailin
    Wang, Tiancai
    Zhang, Xiangyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18222 - 18232
  • [43] Monocular 3D Object Detection for Autonomous Driving Based on Contextual Transformer
    She, Xiangyang
    Yan, Weijia
    Dong, Lihong
    Computer Engineering and Applications, 2024, 60 (19) : 178 - 189
  • [44] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
    Sun, Pei
    Tan, Mingxing
    Wang, Weiyue
    Liu, Chenxi
    Xia, Fei
    Leng, Zhaoqi
    Anguelov, Dragomir
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
  • [45] MVTr: multi-feature voxel transformer for 3D object detection
    Lingmei Ai
    Zhuoyu Xie
    Ruoxia Yao
    Mengyao Yang
    The Visual Computer, 2024, 40 : 1453 - 1466
  • [46] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [47] MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
    Zhou, Yunsong
    Zhu, Hongzi
    Liu, Quan
    Chang, Shan
    Guo, Minyi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17493 - 17503
  • [48] Multimodal 3D Object Detection Based on Sparse Interaction in Internet of Vehicles
    Li, Hui
    Ge, Tongao
    Bai, Keqiang
    Nie, Gaofeng
    Xu, Lingwei
    Ai, Xiaoxue
    Cao, Song
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) : 2174 - 2186
  • [49] VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection
    Wang, Lin
    Sun, Shiliang
    Zhao, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10597 - 10609
  • [50] Transformer-Based Global PointPillars 3D Object Detection Method
    Zhang, Lin
    Meng, Hua
    Yan, Yunbing
    Xu, Xiaowei
    ELECTRONICS, 2023, 12 (14)