SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

被引:1
|
作者
Zhang, Chenyangguang [1 ]
Lou, Zhiqiang [1 ]
Di, Yan [2 ]
Tombari, Federico [2 ,3 ]
Ji, Xiangyang [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Google, Munich, Germany
基金
国家重点研发计划;
关键词
3D reconstruction; real time; visual SLAM guidance;
D O I
10.1109/ICME55011.2023.00348
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods demonstrate promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D reconstruction network SST, which utilizes Sparse estimated points from visual SLAM system as additional Spatial guidance and fuses Temporal features via a cross-modal attention mechanism, achieving more detailed reconstruction results. We propose a Local Spatial-Temporal Fusion module to exploit more informative spatial-temporal cues from multi-view color information and sparse priors, as well a Global Spatial-Temporal Fusion module to refine the local TSDF volumes with the world-frame model from coarse to fine. Extensive experiments on ScanNet and 7-Scenes demonstrate that SST outperforms all state-of-the-art competitors, whilst keeping a high inference speed at 59 FPS, enabling real-world applications with real-time requirements.
引用
收藏
页码:2033 / 2038
页数:6
相关论文
共 50 条
  • [31] End-to-end joint spectral-spatial compression and reconstruction of hyperspectral images using a 3D convolutional autoencoder
    Chong, Yanwen
    Chen, Linwei
    Pan, Shaoming
    JOURNAL OF ELECTRONIC IMAGING, 2021, 30 (04)
  • [32] NeuralRecon: Real-Time Coherent 3D Scene Reconstruction From Monocular Video
    Chen, Xi
    Sun, Jiaming
    Xie, Yiming
    Bao, Hujun
    Zhou, Xiaowei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 7542 - 7555
  • [33] End-to-End 3D Neuroendoscopic Video Reconstruction for Robot-Assisted Ventriculostomy
    Vagdargi, P.
    Uneri, A.
    Liu, S.
    Jones, C. K.
    Sisniega, A.
    Lee, J.
    Helm, P. A.
    Anderson, W. S.
    Luciano, M.
    Hager, G. . D.
    Siewerdsen, J. H.
    IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [34] Mobile3DRecon: Real-time Monocular 3D Reconstruction on a Mobile Phone
    Yang, Xingbin
    Zhou, Liyang
    Jiang, Hanqing
    Tang, Zhongliang
    Wang, Yuanbo
    Bao, Hujun
    Zhang, Guofeng
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2020, 26 (12) : 3446 - 3456
  • [35] AGMFusion: A Real-Time End-to-End Infrared and Visible Image Fusion Network Based on Adaptive Guidance Module
    Liu, Shenghao
    Lan, Xiaoxiong
    Chen, Wenyong
    Zhang, Zhiyong
    Qiu, Changzhen
    IEEE SENSORS JOURNAL, 2024, 24 (17) : 28338 - 28350
  • [36] Looking Around Flatland: End-to-End 2D Real-Time NLOS Imaging
    Pena, Maria
    Gutierrez, Diego
    Marco, Julio
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2025, 11 : 189 - 200
  • [37] Real-time continuous detection and recognition of dynamic hand gestures in untrimmed sequences based on end-to-end architecture with 3D DenseNet and LSTM
    Zhi Lu
    Shiyin Qin
    Pin Lv
    Liguo Sun
    Bo Tang
    Multimedia Tools and Applications, 2024, 83 : 16275 - 16312
  • [38] Real-time continuous detection and recognition of dynamic hand gestures in untrimmed sequences based on end-to-end architecture with 3D DenseNet and LSTM
    Lu, Zhi
    Qin, Shiyin
    Lv, Pin
    Sun, Liguo
    Tang, Bo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (06) : 16275 - 16312
  • [39] Real-time 3D video Acquisition and Auto-stereoscopic Display End-to-End Algorithm Based on Tiled Multi-projectors
    Guo, Huayuan
    Qin, Kaihuai
    Sun, Feng
    2015 5TH INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2015), 2015, : 318 - 323
  • [40] 3D GEOMETRY DESIGN VIA END-TO-END OPTIMIZATION FOR LAND SEISMIC ACQUISITION
    Hernandez-Rojas, Alejandra
    Arguello, Henry
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 4053 - 4057