Generalizable stereo depth estimation with masked image modelling

被引:1
|
作者
Tukra, Samyakh [1 ,2 ]
Xu, Haozheng [1 ]
Xu, Chi [1 ]
Giannarou, Stamatia [1 ]
机构
[1] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London, England
[2] Imperial Coll London, Exhibit Rd,South Kensington Campus, London, England
关键词
computer vision; convolutional neural nets; learning (artificial intelligence); neural nets; stereo image processing;
D O I
10.1049/htl2.12067
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Generalizable and accurate stereo depth estimation is vital for 3D reconstruction, especially in surgery. Supervised learning methods obtain best performance however, limited ground truth data for surgical scenes limits generalizability. Self-supervised methods don't need ground truth, but suffer from scale ambiguity and incorrect disparity prediction due to inconsistency of photometric loss. This work proposes a two-phase training procedure that is generalizable and retains the high performance of supervised methods. It entails: (1) performing self-supervised representation learning of left and right views via masked image modelling (MIM) to learn generalizable semantic stereo features (2) utilizing the MIM pre-trained model to learn robust depth representation via supervised learning for disparity estimation on synthetic data only. To improve stereo representations learnt via MIM, perceptual loss terms are introduced, which improve the model's stereo representations learnt by explicitly encouraging the learning of higher scene-level features. Qualitative and quantitative performance evaluation on surgical and natural scenes shows that the approach achieves sub-millimetre accuracy and lowest errors respectively, setting a new state-of-the-art. Despite not training on surgical nor natural scene data for disparity estimation. This research develops a novel stereo depth estimation method, integrating self-supervised and supervised learning. It begins with masked image modelling for stereo-semantic feature learning, then refines it through supervised training on synthetic data for disparity estimation. Enhanced by perceptual loss and model design, the method achieves sub-millimeter accuracy in surgical and natural scenes, setting a new benchmark without requiring real-world data.image
引用
收藏
页码:108 / 116
页数:9
相关论文
共 50 条
  • [1] Masked Image Training for Generalizable Deep Image Denoising
    Chen, Haoyu
    Gu, Jinjin
    Liu, Yihao
    Magid, Salma Abdel
    Dong, Chao
    Wang, Qiong
    Pfister, Hanspeter
    Zhu, Lei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1692 - 1703
  • [2] Anytime Stereo Image Depth Estimation on Mobile Devices
    Wang, Yan
    Lai, Zihang
    Huang, Gao
    Wang, Brian H.
    van der Maaten, Laurens
    Campbell, Mark
    Weinberger, Kilian Q.
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 5893 - 5900
  • [3] Depth estimation and image restoration using defocused stereo pairs
    Rajagopalan, AN
    Chaudhuri, S
    Mudenagudi, U
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (11) : 1521 - 1525
  • [4] Stereo Image Warping for Improved Depth Estimation of Road Surfaces
    Einecke, Nils
    Eggert, Julian
    2013 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2013, : 189 - 194
  • [5] Enhancing Stereo Image Formation and Depth Map Estimation for Mastcam Images
    Kwan, Chiman
    Chou, Bryan
    Ayhan, Bulent
    2018 9TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2018, : 566 - 572
  • [6] Depth Estimation in Multi-View Stereo Based on Image Pyramid
    Xu, Hanfei
    Cai, Yangang
    Wang, Ronggang
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 345 - 349
  • [7] Stereo Depth Estimation with Echoes
    Zhang, Chenghao
    Tian, Kun
    Ni, Bolin
    Meng, Gaofeng
    Fan, Bin
    Zhang, Zhaoxiang
    Pan, Chunhong
    COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 496 - 513
  • [8] Error modelling of depth estimation based on simplified stereo vision for mobile robots
    Jin, Bo
    Zhao, Lijun
    Zhu, Shiqiang
    Jin, Bo, 1600, Transport and Telecommunication Institute, Lomonosova street 1, Riga, LV-1019, Latvia (18): : 450 - 454
  • [9] Unifying Flow, Stereo and Depth Estimation
    Xu, Haofei
    Zhang, Jing
    Cai, Jianfei
    Rezatofighi, Hamid
    Yu, Fisher
    Tao, Dacheng
    Geiger, Andreas
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13941 - 13958
  • [10] DEPTH FROM STEREO IMAGE FLOW
    CHANG, CC
    CHATTERJEE, S
    1989 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-3: CONFERENCE PROCEEDINGS, 1989, : 586 - 591