SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

被引:1
|
作者
Zhang, Chenyangguang [1 ]
Lou, Zhiqiang [1 ]
Di, Yan [2 ]
Tombari, Federico [2 ,3 ]
Ji, Xiangyang [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Google, Munich, Germany
基金
国家重点研发计划;
关键词
3D reconstruction; real time; visual SLAM guidance;
D O I
10.1109/ICME55011.2023.00348
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods demonstrate promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D reconstruction network SST, which utilizes Sparse estimated points from visual SLAM system as additional Spatial guidance and fuses Temporal features via a cross-modal attention mechanism, achieving more detailed reconstruction results. We propose a Local Spatial-Temporal Fusion module to exploit more informative spatial-temporal cues from multi-view color information and sparse priors, as well a Global Spatial-Temporal Fusion module to refine the local TSDF volumes with the world-frame model from coarse to fine. Extensive experiments on ScanNet and 7-Scenes demonstrate that SST outperforms all state-of-the-art competitors, whilst keeping a high inference speed at 59 FPS, enabling real-world applications with real-time requirements.
引用
收藏
页码:2033 / 2038
页数:6
相关论文
共 50 条
  • [1] An End-to-End Real-Time 3D System for Integral Photography Display
    Zhang, Shenghao
    Wang, Zhenyu
    Zhu, Mingtong
    Wang, Ronggang
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 246 - 256
  • [2] A fully end-to-end deep learning approach for real-time simultaneous 3D reconstruction and material recognition
    Zhao, Cheng
    Sun, Li
    Stolkin, Rustam
    2017 18TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2017, : 75 - 82
  • [3] REAL-TIME 3D FACE RECONSTRUCTION FROM SINGLE IMAGE USING END-TO-END CNN REGRESSION
    Wang, Shan
    Shen, Xukun
    Yu, Kun
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 3293 - 3297
  • [4] Spatial-temporal feature-based End-to-end Fourier network for 3D sign language recognition
    Abdullahi, Sunusi Bala
    Chamnongthai, Kosin
    Bolon-Canedo, Veronica
    Cancela, Brais
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 248
  • [5] CapsulePose: A variational CapsNet for real-time end-to-end 3D human pose estimation
    Garau, Nicola
    Conci, Nicola
    NEUROCOMPUTING, 2023, 523 : 81 - 91
  • [6] A Real-Time 3D End-to-End Augmented Reality System (and its Representation Transformations)
    Tytgat, Donny
    Aerts, Maarten
    De Busser, Jeroen
    Lievens, Sammy
    Alface, Patrice Rondao
    Macq, Jean -Francois
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIX, 2016, 9971
  • [7] End-to-End Video Instance Segmentation via Spatial-Temporal Graph Neural Networks
    Wang, Tao
    Xu, Ning
    Chen, Kean
    Lin, Weiyao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10777 - 10786
  • [8] Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving
    Li, Peixuan
    Jin, Jieyu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 3875 - 3884
  • [9] An end-to-end framework for unconstrained monocular 3D hand pose estimation
    Sharma, Sanjeev
    Huang, Shaoli
    PATTERN RECOGNITION, 2021, 115
  • [10] RhythmNet: End-to-End Heart Rate Estimation From Face via Spatial-Temporal Representation
    Niu, Xuesong
    Shan, Shiguang
    Han, Hu
    Chen, Xilin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2409 - 2423