Generalizable stereo depth estimation with masked image modelling

被引:1
|
作者
Tukra, Samyakh [1 ,2 ]
Xu, Haozheng [1 ]
Xu, Chi [1 ]
Giannarou, Stamatia [1 ]
机构
[1] Imperial Coll London, Hamlyn Ctr Robot Surg, Dept Surg & Canc, London, England
[2] Imperial Coll London, Exhibit Rd,South Kensington Campus, London, England
关键词
computer vision; convolutional neural nets; learning (artificial intelligence); neural nets; stereo image processing;
D O I
10.1049/htl2.12067
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Generalizable and accurate stereo depth estimation is vital for 3D reconstruction, especially in surgery. Supervised learning methods obtain best performance however, limited ground truth data for surgical scenes limits generalizability. Self-supervised methods don't need ground truth, but suffer from scale ambiguity and incorrect disparity prediction due to inconsistency of photometric loss. This work proposes a two-phase training procedure that is generalizable and retains the high performance of supervised methods. It entails: (1) performing self-supervised representation learning of left and right views via masked image modelling (MIM) to learn generalizable semantic stereo features (2) utilizing the MIM pre-trained model to learn robust depth representation via supervised learning for disparity estimation on synthetic data only. To improve stereo representations learnt via MIM, perceptual loss terms are introduced, which improve the model's stereo representations learnt by explicitly encouraging the learning of higher scene-level features. Qualitative and quantitative performance evaluation on surgical and natural scenes shows that the approach achieves sub-millimetre accuracy and lowest errors respectively, setting a new state-of-the-art. Despite not training on surgical nor natural scene data for disparity estimation. This research develops a novel stereo depth estimation method, integrating self-supervised and supervised learning. It begins with masked image modelling for stereo-semantic feature learning, then refines it through supervised training on synthetic data for disparity estimation. Enhanced by perceptual loss and model design, the method achieves sub-millimeter accuracy in surgical and natural scenes, setting a new benchmark without requiring real-world data.image
引用
收藏
页码:108 / 116
页数:9
相关论文
共 50 条
  • [41] Depth Estimation by Combining Stereo Matching and Coded Aperture
    Wang, Chun
    Sahin, Erdem
    Suominen, Olli
    Gotchev, Atanas
    2014 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING CONFERENCE, 2014, : 291 - 294
  • [42] Dual-stream stereo network for depth estimation
    Yangyang Zhong
    Tong Jia
    Kaiqi Xi
    Wenhao Li
    Dongyue Chen
    The Visual Computer, 2023, 39 : 5343 - 5357
  • [43] Dual-stream stereo network for depth estimation
    Zhong, Yangyang
    Jia, Tong
    Xi, Kaiqi
    Li, Wenhao
    Chen, Dongyue
    VISUAL COMPUTER, 2023, 39 (11): : 5343 - 5357
  • [44] FacialStereo: Facial Depth Estimation from a Stereo Pair
    Kanojia, Gagan
    Raman, Shanmuganathan
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 3, 2014, : 686 - 691
  • [45] Unsupervised Stereo Depth Estimation Refined by Perceptual Loss
    Wang Benzhang
    Fu Huini
    Feng Yiliu
    Liu, Hengzhu
    PROCEEDINGS OF 5TH IEEE CONFERENCE ON UBIQUITOUS POSITIONING, INDOOR NAVIGATION AND LOCATION-BASED SERVICES (UPINLBS), 2018, : 34 - 39
  • [46] Analysis of depth estimation error for cylindrical stereo imaging
    Basu, A
    Sahabi, H
    PATTERN RECOGNITION, 2002, 35 (11) : 2549 - 2558
  • [47] LiDAR - Stereo Camera Fusion for Accurate Depth Estimation
    Cholakkal, Hafeez Husain
    Mentasti, Simone
    Bersani, Mattia
    Arrigoni, Stefano
    Matteucci, Matteo
    Cheli, Federico
    2020 AEIT INTERNATIONAL CONFERENCE OF ELECTRICAL AND ELECTRONIC TECHNOLOGIES FOR AUTOMOTIVE (AEIT AUTOMOTIVE), 2020,
  • [48] Continuous Depth Estimation for Multi-view Stereo
    Liu, Yebin
    Cao, Xun
    Dai, Qionghai
    Xu, Wenli
    CVPR: 2009 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-4, 2009, : 2121 - 2128
  • [49] A distributed adaptive architecture for analog stereo depth estimation
    Bisio, GM
    Crespi, B
    Raffo, L
    Sabatini, SP
    Soncini, G
    Valdes, A
    INTERNATIONAL WORKSHOP ON NEURAL NETWORKS FOR IDENTIFICATION, CONTROL, ROBOTICS, AND SIGNAL/IMAGE PROCESSING - PROCEEDINGS, 1996, : 360 - 367
  • [50] A novel depth estimation method for uncalibrated stereo images
    Loghman, Maziar
    Zarshenas, Amin
    Chung, Kwang-Hoon
    Lee, Yunsik
    Kim, Loohee
    2014 INTERNATIONAL SOC DESIGN CONFERENCE (ISOCC), 2014, : 186 - 187