Joint Semantic Segmentation and 3D Reconstruction from Monocular Video

被引:0
|
作者
Kundu, Abhijit [1 ]
Li, Yin [1 ]
Dellaert, Frank [1 ]
Li, Fuxin [1 ]
Rehg, James M. [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
关键词
RECOGNITION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an approach for joint inference of 3D scene structure and semantic labeling for monocular video. Starting with monocular image stream, our framework produces a 3D volumetric semantic + occupancy map, which is much more useful than a series of 2D semantic label images or a sparse point cloud produced by traditional semantic segmentation and Structure from Motion(SfM) pipelines respectively. We derive a Conditional Random Field (CRF) model defined in the 3D space, that jointly infers the semantic category and occupancy for each voxel. Such a joint inference in the 3D CRF paves the way for more informed priors and constraints, which is otherwise not possible if solved separately in their traditional frameworks. We make use of class specific semantic cues that constrain the 3D structure in areas, where multiview constraints are weak. Our model comprises of higher order factors, which helps when the depth is unobservable. We also make use of class specific semantic cues to reduce either the degree of such higher order factors, or to approximately model them with unaries if possible. We demonstrate improved 3D structure and temporally consistent semantic segmentation for difficult, large scale, forward moving monocular image sequences. [GRAPHICS] .
引用
收藏
页码:703 / 718
页数:16
相关论文
共 50 条
  • [41] SEGMENTATION AND 3D RECONSTRUCTION OF NON-RIGID SHAPE FROM RGB VIDEO
    Agudo, Antonio
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2845 - 2849
  • [42] Real-Time 3D Pose Reconstruction of Human Body from Monocular Video Sequences
    Zhu, LiangJia
    Hwang, Jenq-Neng
    Chen, Chih-Chang
    Lin, Ming-Hui
    Yen, Chen-Lan
    [J]. ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 717 - +
  • [43] A Real-Time Online Learning Framework for Joint 3D Reconstruction and Semantic Segmentation of Indoor Scenes
    Menini, Davide
    Kumar, Suryansh
    Oswald, Martin R.
    Sandstrom, Erik
    Sminchisescu, Cristian
    Van Gool, Luc
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 1332 - 1339
  • [44] Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video
    Gartner, Erik
    Andriluka, Mykhaylo
    Xu, Hongyi
    Sminchisescu, Cristian
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13096 - 13105
  • [45] A novel no-sensors 3D model reconstruction from monocular video frames for a dynamic environment
    Fathy, Ghada M.
    Hassan, Hanan A.
    Sheta, Walaa
    Omara, Fatma A.
    Nabil, Emad
    [J]. PEERJ COMPUTER SCIENCE, 2021,
  • [46] A novel no-sensors 3D model reconstruction from monocular video frames for a dynamic environment
    Fathy, Ghada M.
    Hassan, Hanan A.
    Sheta, Walaa
    Omara, Fatma A.
    Nabil, Emad
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 22
  • [47] Creating stereoscopic (3D) video from a 2D monocular video stream
    Li, Xiaokun
    Xu, Roger
    Zhou, Jin
    Li, Baoxin
    [J]. ADVANCES IN VISUAL COMPUTING, PT I, 2007, 4841 : 258 - +
  • [48] Latent 3D Volume for Joint Depth Estimation and Semantic Segmentation from a Single Image
    Ito, Seiya
    Kaneko, Naoshi
    Sumi, Kazuhiko
    [J]. SENSORS, 2020, 20 (20) : 1 - 20
  • [49] 3D Semantic Trajectory Reconstruction from 3D Pixel Continuum
    Yoon, Jae Shin
    Li, Ziwei
    Park, Hyun Soo
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5060 - 5069
  • [50] 3D Reconstruction and Semantic Segmentation Method Combining PointNet and 3D-LMNet from Single Image
    Chen Hui
    Tong Yong
    Zhu Li
    Liang Weibin
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (18)