Deep Sliding Shapes for Amodal 3D Object Detection in RGB-D Images

被引:412
|
作者
Song, Shuran [1 ]
Xiao, Jianxiong [1 ]
机构
[1] Princeton Univ, Princeton, NJ 08544 USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/CVPR.2016.94
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding box of an object in metric form at its full extent. We introduce Deep Sliding Shapes, a 3D ConvNet formulation that takes a 3D volumetric scene from a RGB-D image as input and outputs 3D object bounding boxes. In our approach, we propose the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D. In particular, we handle objects of various sizes by training an amodal RPN at two different scales and an ORN to regress 3D bounding boxes. Experiments show that our algorithm outperforms the state-of-the-art by 13.8 in mAP and is 200x faster than the original Sliding Shapes.
引用
收藏
页码:808 / 816
页数:9
相关论文
共 50 条
  • [31] Transferable Semi-Supervised 3D Object Detection From RGB-D Data
    Tang, Yew Siang
    Lee, Gim Hee
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1931 - 1940
  • [32] Efficient 3D Object Detection of Indoor Scenes Based on RGB-D Video Stream
    Miao Y.
    Chen J.
    Zhang X.
    Ma W.
    Sun S.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2021, 33 (07): : 1015 - 1025
  • [33] Salient object detection for RGB-D images by generative adversarial network
    Zhengyi Liu
    Jiting Tang
    Qian Xiang
    Peng Zhao
    Multimedia Tools and Applications, 2020, 79 : 25403 - 25425
  • [34] Salient object detection for RGB-D images by generative adversarial network
    Liu, Zhengyi
    Tang, Jiting
    Xiang, Qian
    Zhao, Peng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 25403 - 25425
  • [35] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Yan Chen
    Jianjun Ni
    Guangyi Tang
    Weidong Cao
    Simon X. Yang
    Multimedia Tools and Applications, 2024, 83 : 12159 - 12184
  • [36] An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images
    Chen, Yan
    Ni, Jianjun
    Tang, Guangyi
    Cao, Weidong
    Yang, Simon X.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12159 - 12184
  • [37] When 3D Reconstruction Meets Ubiquitous RGB-D Images
    Zhang, Quanshi
    Song, Xuan
    Shao, Xiaowei
    Zhao, Huijing
    Shibasaki, Ryosuke
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 700 - 707
  • [38] Aligning 3D Models to RGB-D Images of Cluttered Scenes
    Gupta, Saurabh
    Arbelaez, Pablo
    Girshick, Ross
    Malik, Jitendra
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 4731 - 4740
  • [39] Consistent 3D Models from Unorganized RGB-D Images
    Tascon Vidarte, Jose David
    Loaiza Correa, Humberto
    2014 IEEE 27TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2014,
  • [40] CoCNN: RGB-D deep fusion for stereoscopic salient object detection
    Liang, Fangfang
    Duan, Lijuan
    Ma, Wei
    Qiao, Yuanhua
    Cai, Zhi
    Miao, Jun
    Ye, Qixiang
    PATTERN RECOGNITION, 2020, 104 (104)