Generalizable stereo depth estimation with masked image modelling

被引:1
|
作者
Tukra, Samyakh [1 ,2 ]
Xu, Haozheng [1 ]
Xu, Chi [1 ]
Giannarou, Stamatia [1 ]
机构
[1] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London, England
[2] Imperial Coll London, Exhibit Rd,South Kensington Campus, London, England
关键词
computer vision; convolutional neural nets; learning (artificial intelligence); neural nets; stereo image processing;
D O I
10.1049/htl2.12067
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Generalizable and accurate stereo depth estimation is vital for 3D reconstruction, especially in surgery. Supervised learning methods obtain best performance however, limited ground truth data for surgical scenes limits generalizability. Self-supervised methods don't need ground truth, but suffer from scale ambiguity and incorrect disparity prediction due to inconsistency of photometric loss. This work proposes a two-phase training procedure that is generalizable and retains the high performance of supervised methods. It entails: (1) performing self-supervised representation learning of left and right views via masked image modelling (MIM) to learn generalizable semantic stereo features (2) utilizing the MIM pre-trained model to learn robust depth representation via supervised learning for disparity estimation on synthetic data only. To improve stereo representations learnt via MIM, perceptual loss terms are introduced, which improve the model's stereo representations learnt by explicitly encouraging the learning of higher scene-level features. Qualitative and quantitative performance evaluation on surgical and natural scenes shows that the approach achieves sub-millimetre accuracy and lowest errors respectively, setting a new state-of-the-art. Despite not training on surgical nor natural scene data for disparity estimation. This research develops a novel stereo depth estimation method, integrating self-supervised and supervised learning. It begins with masked image modelling for stereo-semantic feature learning, then refines it through supervised training on synthetic data for disparity estimation. Enhanced by perceptual loss and model design, the method achieves sub-millimeter accuracy in surgical and natural scenes, setting a new benchmark without requiring real-world data.image
引用
收藏
页码:108 / 116
页数:9
相关论文
共 50 条
  • [21] Depth Estimation using Monocular and Stereo Cues
    Saxena, Ashutosh
    Schulte, Jamie
    Ng, Andrew Y.
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2197 - 2203
  • [22] Depth estimation with a panoramic stereo imaging system
    Tian, Yanbing
    Bai, Jian
    Huang, Zhi
    Guangxue Xuebao/Acta Optica Sinica, 2013, 33 (06):
  • [23] Motion-stereo integration for depth estimation
    Strecha, C
    Van Gool, L
    COMPUTER VISION - ECCV 2002, PT II, 2002, 2351 : 170 - 185
  • [24] A tunable perceptual microsystem for stereo depth estimation
    Bruccoleri, F
    Sabatini, SP
    Bisio, GM
    Raffo, L
    1997 2ND IEEE-CAS REGION 8 WORKSHOP ON ANALOG AND MIXED IC DESIGN, PROCEEDINGS, 1997, : 47 - 52
  • [25] A Depth Map Estimation Approach for Trinocular Stereo
    Zhou, Jun
    Wang, Ling
    Gu, Xiao
    Xu, Kang
    Zhang, Ya
    2015 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2015,
  • [26] Learning to refine depth for robust stereo estimation
    Cheng, Feiyang
    He, Xuming
    Zhang, Hong
    PATTERN RECOGNITION, 2018, 74 : 122 - 133
  • [27] Improved depth map estimation in Stereo Vision
    Fradi, Hajer
    Dugelay, Jean-Luc
    STEREOSCOPIC DISPLAYS AND APPLICATIONS XXII, 2011, 7863
  • [28] Stereo depth estimation: a confidence interval approach
    Mandelbaum, R
    Kamberova, G
    Mintz, M
    SIXTH INTERNATIONAL CONFERENCE ON COMPUTER VISION, 1998, : 503 - 509
  • [29] Masked Image Modelling for Retinal OCT Understanding
    Pissas, Theodoros
    Marquez-Neila, Pablo
    Wolfe, Sebastian
    Zinkernagel, Martin
    Sznitman, Raphael
    OPHTHALMIC MEDICAL IMAGE ANALYSIS, OMIA 2024, 2025, 15188 : 115 - 125
  • [30] Robust 3-D depth estimation using genetic algorithm in stereo image pairs
    Kim, YS
    Han, KP
    Lee, EJ
    Ha, YH
    APCCAS '96 - IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS '96, 1996, : 357 - 360