Multimodal Transformer for Automatic 3D Annotation and Object Detection

被引:1
|
作者
Liu, Chang [1 ]
Qian, Xiaoyan [1 ]
Huang, Binxiao [1 ]
Qi, Xiaojuan [1 ]
Lam, Edmund [1 ]
Tan, Siew-Chong [1 ]
Wong, Ngai [1 ]
机构
[1] Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China
来源
关键词
3d Autolabeler; 3d Object detection; Multimodal vision; Self-attention; Self-supervision; Transformer;
D O I
10.1007/978-3-031-19839-7_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45% AP on KITTI hard samples. Codes are at https://github.com/Cliu2/MTrans.
引用
收藏
页码:657 / 673
页数:17
相关论文
共 50 条
  • [41] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [42] MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
    Zhou, Yunsong
    Zhu, Hongzi
    Liu, Quan
    Chang, Shan
    Guo, Minyi
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17493 - 17503
  • [43] VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection
    Wang, Lin
    Sun, Shiliang
    Zhao, Jing
    [J]. IEEE Transactions on Multimedia, 2024, 26 : 10597 - 10609
  • [44] PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer
    Jiang, Yanqin
    Zhang, Li
    Miao, Zhenwei
    Zhu, Xiatian
    Gao, Jin
    Hu, Weimin
    Jiang, Yu-Gang
    [J]. THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1042 - 1050
  • [45] Transformer-Based Global PointPillars 3D Object Detection Method
    Zhang, Lin
    Meng, Hua
    Yan, Yunbing
    Xu, Xiaowei
    [J]. ELECTRONICS, 2023, 12 (14)
  • [46] 3D Object Detection and Localization using Multimodal Point Pair Features
    Drost, Bertram
    Ilic, Slobodan
    [J]. SECOND JOINT 3DIM/3DPVT CONFERENCE: 3D IMAGING, MODELING, PROCESSING, VISUALIZATION & TRANSMISSION (3DIMPVT 2012), 2012, : 9 - 16
  • [47] Towards a Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation
    Meng, Qinghao
    Wang, Wenguan
    Zhou, Tianfei
    Shen, Jianbing
    Jia, Yunde
    Van Gool, Luc
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4454 - 4468
  • [48] ACCURATE TARGET ANNOTATION IN 3D FROM MULTIMODAL STREAMS
    Lanz, Oswald
    Brutti, Alessio
    Xompero, Alessio
    Qian, Xinyuan
    Omologo, Maurizio
    Cavallaro, Andrea
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3931 - 3935
  • [49] Automatic 3D object placement for 3D scene generation
    Akazawa, Y
    Okada, Y
    Niijima, K
    [J]. MODELLING AND SIMULATION 2003, 2003, : 316 - 318
  • [50] OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection
    Qi, Zhangyang
    Wang, Jiaqi
    Wu, Xiaoyang
    Zhao, Hengshuang
    [J]. 2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, : 1188 - 1197