Multimodal Transformer for Automatic 3D Annotation and Object Detection

被引：7

作者：

Liu, Chang ^{[1
]}

Qian, Xiaoyan ^{[1
]}

Huang, Binxiao ^{[1
]}

Qi, Xiaojuan ^{[1
]}

Lam, Edmund ^{[1
]}

Tan, Siew-Chong ^{[1
]}

Wong, Ngai ^{[1
]}

机构：

[1] Univ Hong Kong, Pokfulam, Hong Kong, Peoples R China

来源：

COMPUTER VISION, ECCV 2022, PT XXXVIII | 2022年 / 13698卷

关键词：

3d Autolabeler; 3d Object detection; Multimodal vision; Self-attention; Self-supervision; Transformer;

D O I：

10.1007/978-3-031-19839-7_38

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite a growing number of datasets being collected for training 3D object detection models, significant human effort is still required to annotate 3D boxes on LiDAR scans. To automate the annotation and facilitate the production of various customized datasets, we propose an end-to-end multimodal transformer (MTrans) autolabeler, which leverages both LiDAR scans and images to generate precise 3D box annotations from weak 2D bounding boxes. To alleviate the pervasive sparsity problem that hinders existing autolabelers, MTrans densifies the sparse point clouds by generating new 3D points based on 2D image information. With a multi-task design, MTrans segments the foreground/background, densifies LiDAR point clouds, and regresses 3D boxes simultaneously. Experimental results verify the effectiveness of the MTrans for improving the quality of the generated labels. By enriching the sparse point clouds, our method achieves 4.48% and 4.03% better 3D AP on KITTI moderate and hard samples, respectively, versus the state-of-the-art autolabeler. MTrans can also be extended to improve the accuracy for 3D object detection, resulting in a remarkable 89.45% AP on KITTI hard samples. Codes are at https://github.com/Cliu2/MTrans.

引用

页码：657 / 673

页数：17

共 50 条

[41] Fusion information enhanced method based on transformer for 3D object detection
Jin Y.
Tao C.
Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument, 2023, 44 (12): : 297 - 306
[42] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Yan, Junjie
Liu, Yingfei
Sun, Jianjian
Jia, Fan
Li, Shuailin
Wang, Tiancai
Zhang, Xiangyu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18222 - 18232
[43] Monocular 3D Object Detection for Autonomous Driving Based on Contextual Transformer
She, Xiangyang
Yan, Weijia
Dong, Lihong
Computer Engineering and Applications, 2024, 60 (19) : 178 - 189
[44] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
Sun, Pei
Tan, Mingxing
Wang, Weiyue
Liu, Chenxi
Xia, Fei
Leng, Zhaoqi
Anguelov, Dragomir
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
[45] MVTr: multi-feature voxel transformer for 3D object detection
Lingmei Ai
Zhuoyu Xie
Ruoxia Yao
Mengyao Yang
The Visual Computer, 2024, 40 : 1453 - 1466
[46] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
Zhang, Renrui
Qiu, Han
Wang, Tai
Guo, Ziyu
Cui, Ziteng
Qiao, Yu
Li, Hongsheng
Gao, Peng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
[47] MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
Zhou, Yunsong
Zhu, Hongzi
Liu, Quan
Chang, Shan
Guo, Minyi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17493 - 17503
[48] Multimodal 3D Object Detection Based on Sparse Interaction in Internet of Vehicles
Li, Hui
Ge, Tongao
Bai, Keqiang
Nie, Gaofeng
Xu, Lingwei
Ai, Xiaoxue
Cao, Song
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (02) : 2174 - 2186
[49] VirPNet: A Multimodal Virtual Point Generation Network for 3D Object Detection
Wang, Lin
Sun, Shiliang
Zhao, Jing
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10597 - 10609
[50] Transformer-Based Global PointPillars 3D Object Detection Method
Zhang, Lin
Meng, Hua
Yan, Yunbing
Xu, Xiaowei
ELECTRONICS, 2023, 12 (14)

← 1 2 3 4 5 →