Multimodal Transformer for Automatic 3D Annotation and Object Detection

被引:1
|
作者
Liu, Chang [1 ]
Qian, Xiaoyan [1 ]
Huang, Binxiao [1 ]
Qi, Xiaojuan [1 ]
Lam, Edmund [1 ]
Tan, Siew-Chong [1 ]
Wong, Ngai [1 ]
机构
[1] Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China
来源
关键词
3d Autolabeler; 3d Object detection; Multimodal vision; Self-attention; Self-supervision; Transformer;
D O I
10.1007/978-3-031-19839-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45% AP on KITTI hard samples. Codes are at https://github.com/Cliu2/MTrans.
引用
收藏
页码:657 / 673
页数:17
相关论文
共 50 条
  • [1] Semi-automatic 3D Object Keypoint Annotation and Detection for the Masses
    Blomqvist, Kenneth
    Chung, Jen Jen
    Ott, Lionel
    Siegwart, Roland
    [J]. 2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3908 - 3914
  • [2] Homogenous multimodal 3D object detection based on deformable Transformer and attribute dependencies
    Dong, Yue
    Li, Xingfeng
    He, Hua
    [J]. PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 346 - 351
  • [3] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [4] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
    Alaba, Simegnew Yihunie
    Ball, John E.
    [J]. IEEE ACCESS, 2024, 12 : 50165 - 50176
  • [5] Dynamic graph transformer for 3D object detection
    Ren, Siyuan
    Pan, Xiao
    Zhao, Wenjie
    Nie, Binling
    Han, Bo
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 259
  • [6] Multimodal 3D Histogram for Moving Object Detection
    Mukherjee, Dibyendu
    Saha, Ashirbani
    Wu, Q. M. Jonathan
    Jiang, Wei
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 2397 - 2402
  • [7] Context-Aware Transformer for 3D Point Cloud Automatic Annotation
    Qian, Xiaoyan
    Liu, Chang
    Qi, Xiaojuan
    Tan, Siew-Chong
    Lam, Edmund
    Wong, Ngai
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 2, 2023, : 2082 - 2090
  • [8] Automatic Pseudo-LiDAR Annotation: Generation of Training Data for 3D Object Detection Networks
    Oh, Changsuk
    Jang, Youngseok
    Shim, Dongseok
    Kim, Changhyeon
    Kim, Junha
    Kim, H. Jin
    [J]. IEEE ACCESS, 2024, 12 : 14227 - 14237
  • [9] TBFNT3D: Two-Branch Fusion Network With Transformer for Multimodal Indoor 3D Object Detection
    Cheng, Jun
    Zhang, Sheng
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6523 - 6530
  • [10] SEFormer: Structure Embedding Transformer for 3D Object Detection
    Feng, Xiaoyu
    Du, Heming
    Fan, Hehe
    Duan, Yueqi
    Liu, Yongpan
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 632 - 640