Generalizable stereo depth estimation with masked image modelling

被引:1
|
作者
Tukra, Samyakh [1 ,2 ]
Xu, Haozheng [1 ]
Xu, Chi [1 ]
Giannarou, Stamatia [1 ]
机构
[1] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London, England
[2] Imperial Coll London, Exhibit Rd,South Kensington Campus, London, England
关键词
computer vision; convolutional neural nets; learning (artificial intelligence); neural nets; stereo image processing;
D O I
10.1049/htl2.12067
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Generalizable and accurate stereo depth estimation is vital for 3D reconstruction, especially in surgery. Supervised learning methods obtain best performance however, limited ground truth data for surgical scenes limits generalizability. Self-supervised methods don't need ground truth, but suffer from scale ambiguity and incorrect disparity prediction due to inconsistency of photometric loss. This work proposes a two-phase training procedure that is generalizable and retains the high performance of supervised methods. It entails: (1) performing self-supervised representation learning of left and right views via masked image modelling (MIM) to learn generalizable semantic stereo features (2) utilizing the MIM pre-trained model to learn robust depth representation via supervised learning for disparity estimation on synthetic data only. To improve stereo representations learnt via MIM, perceptual loss terms are introduced, which improve the model's stereo representations learnt by explicitly encouraging the learning of higher scene-level features. Qualitative and quantitative performance evaluation on surgical and natural scenes shows that the approach achieves sub-millimetre accuracy and lowest errors respectively, setting a new state-of-the-art. Despite not training on surgical nor natural scene data for disparity estimation. This research develops a novel stereo depth estimation method, integrating self-supervised and supervised learning. It begins with masked image modelling for stereo-semantic feature learning, then refines it through supervised training on synthetic data for disparity estimation. Enhanced by perceptual loss and model design, the method achieves sub-millimeter accuracy in surgical and natural scenes, setting a new benchmark without requiring real-world data.image
引用
收藏
页码:108 / 116
页数:9
相关论文
共 50 条
  • [31] Masked γ-SSL: Learning Uncertainty Estimation via Masked Image Modeling
    Williams, David S. W.
    Gadd, Matthew
    Newman, Paul
    De Martini, Daniele
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 16192 - 16198
  • [32] Depth cue fusion for event-based stereo depth estimation
    Ghosh, Dipon Kumar
    Jung, Yong Ju
    INFORMATION FUSION, 2025, 117
  • [33] Effects of image resolution on depth perception in stereo and non-stereo images
    JaaAro, KM
    Kjelldahl, L
    STEREOSCOPIC DISPLAYS AND VIRTUAL REALITY SYSTEMS IV, 1997, 3012 : 319 - 326
  • [34] A Cost Effective Estimation of Depth from Stereo Image Pairs Using Shallow Siamese Convolutional Networks
    Park, Juhee
    Lee, Jee-Hyong
    2017 IEEE 5TH INTERNATIONAL SYMPOSIUM ON ROBOTICS AND INTELLIGENT SENSORS (IRIS), 2017, : 213 - 217
  • [35] Robust and direct estimation of 3-D motion and scene depth from stereo image sequences
    Park, SK
    Kweon, IS
    PATTERN RECOGNITION, 2001, 34 (09) : 1713 - 1728
  • [36] Depth-Preserving Warping for Stereo Image Retargeting
    Li, Bing
    Duan, Ling-Yu
    Lin, Chia-Wen
    Huang, Tiejun
    Gao, Wen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2015, 24 (09) : 2811 - 2826
  • [37] Development of depth extraction algorithm for the stereo endoscopic image
    Kim, J
    Hwang, D
    Jeong, H
    Song, C
    Lee, K
    Lee, M
    PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 20, PTS 1-6: BIOMEDICAL ENGINEERING TOWARDS THE YEAR 2000 AND BEYOND, 1998, 20 : 884 - 887
  • [38] Depth Estimation of Stereo Matching Based on Microarray Camera
    Chen, Xiaoguang
    Li, Dan
    Zou, Jiancheng
    2017 2ND INTERNATIONAL CONFERENCE ON IMAGE, VISION AND COMPUTING (ICIVC 2017), 2017, : 108 - 112
  • [39] Depth Estimation from Stereo Images Using Sparsity
    Sakuragi, Kei
    Kawanaka, Akira
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 1161 - 1164
  • [40] Quality Preserving Depth Estimation in Sequential Stereo Images
    Mun, Ji-Hun
    Ho, Yo-Sung
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,