Multi-view PointNet for 3D Scene Understanding

被引:77
|
作者
Jaritz, Maximilian [1 ]
Gu, Jiayuan [2 ]
Su, Hao [2 ]
机构
[1] INRIA, Valeo, Rocquencourt, France
[2] Univ Calif San Diego, San Diego, CA USA
关键词
D O I
10.1109/ICCVW.2019.00494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fusion of 2D images and 3D point clouds is important because information from dense images can enhance sparse point clouds. However, fusion is challenging because 2D and 3D data live in different spaces. In this work, we propose MVPNet (Multi-View PointNet), where we aggregate 2D multi-view image features into 3D point clouds, and then use a point based network to fuse the features in 3D canonical space to predict 3D semantic labels. To this end, we introduce view selection along with a 2D-3D feature aggregation module. Extensive experiments show the benefit of leveraging features from dense images and reveal superior robustness to varying point cloud density compared to 3D-only methods. On the ScanNetV2 [4] benchmark, our MVPNet significantly outperforms prior point cloud based approaches on the task of 3D Semantic Segmentation. It is much faster to train than the large networks of the sparse voxel approach [6]. We provide solid ablation studies to ease the future design of 2D-3D fusion methods and their extension to other tasks, as we showcase for 3D instance segmentation.
引用
收藏
页码:3995 / 4003
页数:9
相关论文
共 50 条
  • [31] Invited paper: Multi-view 3D displays
    Willemsen, Oscar H.
    de Zwart, Siebe T.
    Hiddink, Martin G. H.
    de Boer, Dick K. G.
    Krijn, Marcel P. C. M.
    [J]. 2007 SID INTERNATIONAL SYMPOSIUM, DIGEST OF TECHNICAL PAPERS, VOL XXXVIII, BOOKS I AND II, 2007, 38 : 1154 - 1157
  • [32] Multi-View Transformer for 3D Visual Grounding
    Huang, Shijia
    Chen, Yilun
    Jia, Jiaya
    Wang, Liwei
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15503 - 15512
  • [33] Multi-view video compression for 3D displays
    Zwicker, Matthias
    Yea, Sehoon
    Vetro, Anthony
    Forlines, Clifton
    Matusik, Wojciech
    Pfister, Hanspeter
    [J]. CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, 2007, : 1506 - +
  • [34] Dynamic View Aggregation for Multi-View 3D Shape Recognition
    Zhou, Yuan
    Sun, Zhongqi
    Huo, Shuwei
    Kung, Sun-Yuan
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9163 - 9174
  • [35] Multi-View Fusion-Based 3D Object Detection for Robot Indoor Scene Perception
    Wang, Li
    Li, Ruifeng
    Sun, Jingwen
    Liu, Xingxing
    Zhao, Lijun
    Seah, Hock Soon
    Quah, Chee Kwang
    Tandianus, Budianto
    [J]. SENSORS, 2019, 19 (19)
  • [36] SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation
    Li, Jingzhong
    Yang, Lin
    Shi, Zhen
    Chen, Yuxuan
    Jin, Yue
    Akiyama, Kanta
    Xu, Anze
    [J]. Advanced Engineering Informatics, 2024, 62
  • [37] Virtual View Adaptation for 3D Multi-View Video Streaming
    Petrovic, Goran
    Do, Luat
    Zinger, Sveta
    de With, Peter H. N.
    [J]. STEREOSCOPIC DISPLAYS AND APPLICATIONS XXI, 2010, 7524
  • [38] Multi-view 3D reconstruction: a scene-based, visual hull guided, multi-stereovision framework
    Ismael, Muhannad
    Prevost, Stephanie
    Remion, Yannick
    Loscos, Celine
    Niquin, Cedric
    Orozco, Raissel Ramirez
    Souchet, Philippe
    [J]. CVMP 2015: PROCEEDINGS OF THE 12TH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION, 2015,
  • [39] GeoMIM: Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding
    Liu, Jihao
    Wang, Tai
    Liu, Boxiao
    Zhang, Qihang
    Liu, Yu
    Li, Hongsheng
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 17793 - 17803
  • [40] Influence of depth and 3D crosstalk on blur in multi-view 3D displays
    Hong, Hyungki
    [J]. JOURNAL OF THE SOCIETY FOR INFORMATION DISPLAY, 2017, 25 (07) : 450 - 457