Machine learning for scene 3D reconstruction using a single image

被引:1
|
作者
Knyaz, Vladimir [1 ,2 ]
机构
[1] State Res Inst Aviat Syst GosNIIAS, 7 Victorenko Str, Moscow, Russia
[2] Moscow Inst Phys & Technol MIPT, Moscow, Russia
基金
俄罗斯基础研究基金会;
关键词
image analysis; scene 3D reconstruction; voxel 3D model; deep learning; convolutional neural network; generative adversarial neural network; dataset;
D O I
10.1117/12.2556122
中图分类号
O43 [光学];
学科分类号
070207 ; 0803 ;
摘要
Image-based scene 3D reconstruction is one of the key tasks for many machine vision applications such as scene understanding, object pose estimation, autonomous navigation. A set of reliable and accurate methods for multi-view scene 3D reconstruction has been developed last decades. But a significant drawback of such 3D reconstruction technique is the need for acquiring a large number of images in the processed sequence to obtain an acceptable 3D scene representation. Recently modern convolutional neural network (CNN) models achieve the best quality for object recognition, image segmentation, image translation and some other challenging computer vision problems. The paper proposes a convolutional neural network architecture and a technique for training data preparation which provide a prediction of voxel model of a 3D scene with several objects. For CNN training and evaluation a special dataset was collected and annotated. It contains image sequences of several scenes and corresponding depth images and 3D models of these scenes. The image sequence serves as the primary data used for further scene 3D reconstruction by SfM technique. Structure from Motion processing results in surface 3D models of all objects in the scene and camera positions and orientation for every image in a sequence. Then surface 3D model is transformed into voxel 3D model and segmented into separate objects. Conditional generative adversarial network architecture was developed for 3D reconstruction by single image. Its generative part translates an input color image into an output voxel model. The discriminative part distinguishes the correct output (close to real voxel model) from false output (wrong output voxel model). Both parts are trained simultaneously on the prepared dataset. Evaluation on the testing part of the prepared dataset has demonstrated the ability of prediction 3D models of previously unobserved complex scenes containing several objects. The proposed neural network architecture provides high generalization ability and improved resolution of predicted voxel 3D models.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] 3D Digital Image Virtual Scene Reconstruction Algorithm Based on Machine Learning
    Xie, Yiyi
    International Journal for Engineering Modelling, 2024, 37 (02) : 23 - 40
  • [2] Panoptic 3D Scene Reconstruction From a Single RGB Image
    Dahnert, Manuel
    Hou, Ji
    Niessner, Matthias
    Dai, Angela
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Learning to Recover 3D Scene Shape from a Single Image
    Yin, Wei
    Zhang, Jianming
    Wang, Oliver
    Niklaus, Simon
    Mai, Long
    Chen, Simon
    Shen, Chunhua
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 204 - 213
  • [4] Stage-Based 3D Scene Reconstruction from Single Image
    Liu, Yixian
    Hao, Pengwei
    Izquierdo, Ebroul
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1034 - 1037
  • [5] Holistic 3D Scene Parsing and Reconstruction from a Single RGB Image
    Huang, Siyuan
    Qi, Siyuan
    Zhu, Yixin
    Xiao, Yinxue
    Xu, Yuanlu
    Zhu, Song-Chun
    COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 : 194 - 211
  • [6] 3D scene reconstruction using Kinect
    Morana, M. (marco.morana@unipa.it), 1600, Springer Verlag (260):
  • [7] Learning 3D Scene Semantics and Structure from a Single Depth Image
    Yang, Bo
    Lai, Zihang
    Lu, Xiaoxuan
    Lin, Shuyu
    Wen, Hongkai
    Markham, Andrew
    Trigoni, Niki
    PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 422 - 425
  • [8] 3D Scene Reconstruction with Sparse LiDAR Data and Monocular Image in Single Frame
    Zhong, Yuanxin
    Wang, Sijia
    Xie, Shichao
    Cao, Zhong
    Jiang, Kun
    Yang, Diange
    SAE INTERNATIONAL JOURNAL OF PASSENGER CARS-ELECTRONIC AND ELECTRICAL SYSTEMS, 2018, 11 (01): : 46 - 54
  • [9] CGAN-Based Forest Scene 3D Reconstruction from a Single Image
    Li, Yuan
    Kan, Jiangming
    FORESTS, 2024, 15 (01):
  • [10] Towards Accurate Reconstruction of 3D Scene Shape From A Single Monocular Image
    Yin, Wei
    Zhang, Jianming
    Wang, Oliver
    Niklaus, Simon
    Chen, Simon
    Liu, Yifan
    Shen, Chunhua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6480 - 6494