Multimodal Transformer for Automatic 3D Annotation and Object Detection

被引:7
|
作者
Liu, Chang [1 ]
Qian, Xiaoyan [1 ]
Huang, Binxiao [1 ]
Qi, Xiaojuan [1 ]
Lam, Edmund [1 ]
Tan, Siew-Chong [1 ]
Wong, Ngai [1 ]
机构
[1] Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China
来源
关键词
3d Autolabeler; 3d Object detection; Multimodal vision; Self-attention; Self-supervision; Transformer;
D O I
10.1007/978-3-031-19839-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45% AP on KITTI hard samples. Codes are at https://github.com/Cliu2/MTrans.
引用
收藏
页码:657 / 673
页数:17
相关论文
共 50 条
  • [31] Transformer3D-Det: Improving 3D Object Detection by Vote Refinement
    Zhao, Lichen
    Guo, Jinyang
    Xu, Dong
    Sheng, Lu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) : 4735 - 4746
  • [32] Integrating 3D Objects in Multimodal Video Annotation
    Rodrigues, Rui
    Jurgens, Stephan
    Fernandes, Carla
    Diogo, Joao
    Correia, Nuno
    PROCEEDINGS OF THE 2022 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2022, 2022, : 299 - 304
  • [33] F-Transformer: Point Cloud Fusion Transformer for Cooperative 3D Object Detection
    Wang, Jie
    Luo, Guiyang
    Yuan, Quan
    Li, Jinglin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 171 - 182
  • [34] ReAGFormer: Reaggregation Transformer with Affine Group Features for 3D Object Detection
    Lu, Chenguang
    Yue, Kang
    Liu, Yue
    COMPUTER VISION - ACCV 2022, PT I, 2023, 13841 : 262 - 279
  • [35] BEV transformer for visual 3D object detection applied with retentive mechanism
    Pan, Jincheng
    Huang, Xiaoci
    Luo, Suyun
    Ma, Fang
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2025,
  • [36] PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
    Leng, Zhaoqi
    Sun, Pei
    He, Tong
    Anguelov, Dragomir
    Tan, Mingxing
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4238 - 4244
  • [37] MVTr: multi-feature voxel transformer for 3D object detection
    Ai, Lingmei
    Xie, Zhuoyu
    Yao, Ruoxia
    Yang, Mengyao
    VISUAL COMPUTER, 2024, 40 (03): : 1453 - 1466
  • [38] Radar-camera fusion for 3D object detection with aggregation transformer
    Li, Jun
    Zhang, Han
    Wu, Zizhang
    Xu, Tianhao
    APPLIED INTELLIGENCE, 2024, 54 (21) : 10627 - 10639
  • [39] MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
    Huang, Kuan-Chih
    Wu, Tsung-Han
    Su, Hung-Ting
    Hsu, Winston H.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4002 - 4011
  • [40] Unifying Voxel-based Representation with Transformer for 3D Object Detection
    Li, Yanwei
    Chen, Yilun
    Qi, Xiaojuan
    Li, Zeming
    Sun, Jian
    Jia, Jiaya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,