3D-Scene-Former: 3D scene generation from a single RGB image using Transformers

被引:0
|
作者
Chatterjee, Jit [1 ]
Vega, Maria Torres [1 ]
机构
[1] Katholieke Univ Leuven, Dept Elect Engn ESAT, eMedia Res Lab, B-3000 Leuven, Belgium
来源
关键词
3D scene understanding; Mesh generation; 3D object detection; 3D pose estimation; Transformers; RECOGNITION;
D O I
10.1007/s00371-024-03573-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
3D scene generation requires complex hardware setups, such as multiple cameras and depth sensors. To address this challenge, there is a need for generating 3D scenes from a single RGB image by understanding the spatio-contextual information inside a scene. However, generating 3D scenes from a single RGB image represents a formidable undertaking as the depth information is missing. Moreover, we need to generate the scene from various angles and positions, which necessitates extrapolations from the limited information in a single image. Current state-of-the-art techniques hinge on extracting global and local features from the 2D scene and employ a combined estimation strategy to tackle this challenge. However, existing approaches still grapple with accurately estimating 3D parameters, especially due to the strong occlusions in cluttered environments. In this paper, we propose 3D-Scene-Former, a novel solution to generate 3D indoor scenes from a single RGB image and refine the initial estimations using a Transformer network. We evaluated our approach on two well-known datasets benchmarking it against state-of-the-art solutions. Our method outperforms the state-of-the-art in terms of 3D object detection and 3D pose estimation by a margin of 11.37%. 3D-Scene-Former opens new venues for 3D content creation, transforming a single RGB image into realistic 3D scenes through the use of interconnected mesh structures.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Panoptic 3D Scene Reconstruction From a Single RGB Image
    Dahnert, Manuel
    Hou, Ji
    Niessner, Matthias
    Dai, Angela
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
    Huang, Siyuan
    Qi, Siyuan
    Zhu, Yixin
    Xiao, Yinxue
    Xu, Yuanlu
    Zhu, Song-Chun
    [J]. COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 194 - 211
  • [3] 3D SCENE RECONSTRUCTION FROM RGB IMAGES
    Rotaru, Razvan-Paul
    Gradinaru, Alexandru
    Moldoveanu, Florica
    [J]. UNIVERSITY POLITEHNICA OF BUCHAREST SCIENTIFIC BULLETIN SERIES C-ELECTRICAL ENGINEERING AND COMPUTER SCIENCE, 2024, 86 (02): : 101 - 112
  • [4] A 3D Reconstruction System for Large Scene Based on RGB-D Image
    Wang, Hongren
    Wang, Pengbo
    Wang, Xiaodi
    Peng, Tianchen
    Zhang, Baochang
    [J]. INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 518 - 527
  • [5] Machine learning for scene 3D reconstruction using a single image
    Knyaz, Vladimir
    [J]. OPTICS, PHOTONICS AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS VI, 2021, 11353
  • [6] Inferring 3D scene structure from a single polarization image
    Rahmann, S
    [J]. POLARIZATION AND COLOR TECHNIQUES IN INDUSTRIAL INSPECTION, 1999, 3826 : 22 - 33
  • [7] Learning to Recover 3D Scene Shape from a Single Image
    Yin, Wei
    Zhang, Jianming
    Wang, Oliver
    Niklaus, Simon
    Mai, Long
    Chen, Simon
    Shen, Chunhua
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 204 - 213
  • [8] An Automatic 3D Scene Generation Pipeline Based on a Single 2D Image
    Cannavo, Alberto
    Bardella, Christian
    Semeraro, Lorenzo
    De Lorenzis, Federico
    Zhang, Congyi
    Jiang, Ying
    Lamberti, Fabrizio
    [J]. AUGMENTED REALITY, VIRTUAL REALITY, AND COMPUTER GRAPHICS, 2021, 12980 : 109 - 117
  • [9] Color Constancy Using 3D Scene Geometry Derived From a Single Image
    Elfiky, Noha
    Gevers, Theo
    Gijsenij, Arjan
    Gonzalez, Jordi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (09) : 3855 - 3868
  • [10] 3D Scene Generation by Learning from Examples
    Dema, Mesfin A.
    Sari-Sarraf, Hamed
    [J]. 2012 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2012, : 58 - 64