Cross-Dimensional Refined Learning for Real-Time 3D Visual Perception from Monocular Video

被引:0
|
作者
Hong, Ziyang [1 ]
Yue, C. Patrick [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
D O I
10.1109/ICCVW60793.2023.00231
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a novel real-time capable learning method that jointly perceives a 3D scene's geometry structure and semantic labels. Recent approaches to real-time 3D scene reconstruction mostly adopt a volumetric scheme, where a Truncated Signed Distance Function (TSDF) is directly regressed. However, these volumetric approaches tend to focus on the global coherence of their reconstructions, which leads to a lack of local geometric detail. To overcome this issue, we propose to leverage the latent geometric prior knowledge in 2D image features by explicit depth prediction and anchored feature generation, to refine the occupancy learning in TSDF volume. Besides, we find that this cross-dimensional feature refinement methodology can also be adopted for the semantic segmentation task by utilizing semantic priors. Hence, we proposed an end-to-end cross-dimensional refinement neural network (CDRNet) to extract both 3D mesh and 3D semantic labeling in real time. The experiment results show that this method achieves a state-of-the-art 3D perception efficiency on multiple datasets, which indicates the great potential of our method for industrial applications.
引用
收藏
页码:2161 / 2170
页数:10
相关论文
共 50 条
  • [21] Near real-time 3D reconstruction from InIm video stream
    Chaikalis, D.
    Passalis, G.
    Sgouros, N.
    Maroulis, D.
    Theoharis, T.
    [J]. IMAGE ANALYSIS AND RECOGNITION, PROCEEDINGS, 2008, 5112 : 336 - 347
  • [22] Real-time information recombination of complex 3D tree model based on visual perception
    FAN Jing
    FAN YunYi
    DONG TianYang
    JI Lei
    [J]. Science China(Information Sciences), 2013, 56 (09) : 96 - 109
  • [23] Parallel processing for real-time 3D reconstruction from video streams
    Duckworth, Tobias
    Roberts, David J.
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2014, 9 (03) : 427 - 445
  • [24] Real-time 3D Neural Facial Animation from Binocular Video
    Cao, Chen
    Agrawal, Vasu
    De la Torre, Fernando
    Chen, Lele
    Saragih, Jason
    Simon, Tomas
    Sheikh, Yaser
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2021, 40 (04):
  • [25] Real-time information recombination of complex 3D tree model based on visual perception
    Fan Jing
    Fan YunYi
    Dong TianYang
    Ji Lei
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2013, 56 (09) : 1 - 14
  • [26] Real-time 2D to 3D video conversion
    Ideses, Ianir
    Yaroslavsky, Leonid P.
    Fishbain, Barak
    [J]. JOURNAL OF REAL-TIME IMAGE PROCESSING, 2007, 2 (01) : 3 - 9
  • [27] Real-time information recombination of complex 3D tree model based on visual perception
    Jing Fan
    YunYi Fan
    TianYang Dong
    Lei Ji
    [J]. Science China Information Sciences, 2013, 56 : 1 - 14
  • [28] Real-time 2D to 3D video conversion
    Ianir Ideses
    Leonid P. Yaroslavsky
    Barak Fishbain
    [J]. Journal of Real-Time Image Processing, 2007, 2 : 3 - 9
  • [29] Parallel processing for real-time 3D reconstruction from video streams
    Tobias Duckworth
    David J. Roberts
    [J]. Journal of Real-Time Image Processing, 2014, 9 : 427 - 445
  • [30] Real-time 3D video synthesis from binocular stereo camera
    Xu, Xiubing
    Xie, Xudong
    Dai, Qionghai
    [J]. 2008 3DTV-CONFERENCE: THE TRUE VISION - CAPTURE, TRANSMISSION AND DISPLAY OF 3D VIDEO, 2008, : 113 - +