SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

被引:1
|
作者
Zhang, Chenyangguang [1 ]
Lou, Zhiqiang [1 ]
Di, Yan [2 ]
Tombari, Federico [2 ,3 ]
Ji, Xiangyang [1 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Tech Univ Munich, Munich, Germany
[3] Google, Munich, Germany
基金
国家重点研发计划;
关键词
3D reconstruction; real time; visual SLAM guidance;
D O I
10.1109/ICME55011.2023.00348
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods demonstrate promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D reconstruction network SST, which utilizes Sparse estimated points from visual SLAM system as additional Spatial guidance and fuses Temporal features via a cross-modal attention mechanism, achieving more detailed reconstruction results. We propose a Local Spatial-Temporal Fusion module to exploit more informative spatial-temporal cues from multi-view color information and sparse priors, as well a Global Spatial-Temporal Fusion module to refine the local TSDF volumes with the world-frame model from coarse to fine. Extensive experiments on ScanNet and 7-Scenes demonstrate that SST outperforms all state-of-the-art competitors, whilst keeping a high inference speed at 59 FPS, enabling real-world applications with real-time requirements.
引用
收藏
页码:2033 / 2038
页数:6
相关论文
共 50 条
  • [41] An approach to provisioning for real-time VBR video teleconferencing via end-to-end virtual path connections
    deVeciana, G
    Kesidis, G
    1996 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS - CONVERGING TECHNOLOGIES FOR TOMORROW'S APPLICATIONS, VOLS. 1-3, 1996, : 632 - 636
  • [42] Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net
    Luo, Wenjie
    Yang, Bin
    Urtasun, Raquel
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3569 - 3577
  • [43] Real-Time 3D Pose Reconstruction of Human Body from Monocular Video Sequences
    Zhu, LiangJia
    Hwang, Jenq-Neng
    Chen, Chih-Chang
    Lin, Ming-Hui
    Yen, Chen-Lan
    ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 717 - +
  • [44] PlanarRecon: Real-time 3D Plane Detection and Reconstruction from Posed Monocular Videos
    Xie, Yiming
    Gadelha, Matheus
    Yang, Fengting
    Zhou, Xiaowei
    Jiang, Huaizu
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6209 - 6218
  • [45] Real-time 3D human pose and motion reconstruction from monocular RGB videos
    Yiannakides, Anastasios
    Aristidou, Andreas
    Chrysanthou, Yiorgos
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2019, 30 (3-4)
  • [46] Real-Time 3D Pedestrian Tracking with Monocular Camera
    Xiao, Peng
    Yan, Fei
    Chi, Jiannan
    Wang, Zhiliang
    WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [47] ViP3D: End-to-end Visual Trajectory Prediction via 3D Agent Queries
    Gu, Junru
    Hu, Chenxu
    Zhang, Tianyuan
    Chen, Xuanyao
    Wang, Yilun
    Wang, Yue
    Zhao, Hang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5496 - 5506
  • [48] An end-to-end framework for real-time violent behavior detection based on 2D CNNs
    Zhang, Peng
    Dong, Lijia
    Zhao, Xinlei
    Lei, Weimin
    Zhang, Wei
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2024, 21 (02)
  • [49] An end-to-end framework for real-time violent behavior detection based on 2D CNNs
    Peng Zhang
    Lijia Dong
    Xinlei Zhao
    Weimin Lei
    Wei Zhang
    Journal of Real-Time Image Processing, 2024, 21
  • [50] Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency
    Han, Wencheng
    Tao, Runzhou
    Ling, Haibin
    Shen, Jianbing
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 84 - 98