Self-supervised monocular depth and ego-motion estimation for CT-bronchoscopy fusion

被引:0
|
作者
Chang, Qi [1 ,2 ]
Higgins, William E. [1 ,2 ]
机构
[1] Penn State Univ, Sch Elect Engn & Comp Sci, University Pk, PA 16802 USA
[2] Penn State Univ, Sch Elect Engn & Comp Sci, Hershey, PA 17036 USA
关键词
bronchoscopy; CT imaging; lung cancer; self-supervised learning; monocular depth estimation; depth and ego-motion estimation; CT-video fusion;
D O I
10.1117/12.3004499
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The management of lung cancer necessitates robust diagnostic tools, with three-dimensional (3D) computed tomography (CT) imaging and bronchoscopy standing as pivotal complementary resources. Bronchoscopy captures live endobronchial video, providing striking detail of the airway tree's interior, while 3D CT scans contribute extensive anatomical knowledge. A significant gap persists, however, in linking these data-rich sources, such as in the fusion of video data from bronchoscopic airway exams and airway surface data from 3D CT scans. The main issue is the difficulty in simultaneously acquiring depth and camera pose information for bronchoscopic video frames. A solution to this problem can facilitate CT-video fusion/rendering, multimodal registration, and 3D cancer lesion localization. Deep-learning networks have been recently employed to estimate the depth and ego-motion information. Unfortunately, it is challenging to acquire the required training data, consisting of ground-truth pairs of bronchoscopic video frames and corresponding depth maps. Along this line, generative adversarial networks (GANs) have shown promise in domain transformation from CT-based endoluminal surface views into synthesized bronchoscopic frames. These synthesized views are consequently aligned with their CT-derived depth map, generating valuable training data. Nonetheless, such domain transformation techniques fail to utilize frame sequence knowledge and supply no information about the camera's ego-motion. Parallel studies in other domains, such as endoscopy, have emphasized the photometric consistency between adjacent frames to jointly offer depth and ego-motion estimation. Nevertheless, the texture-less and smooth endoluminal surface inside the airway restricts the generation of distinct depth maps with enhanced clarity and detail. To address this problem, we present a self-supervised training strategy that incorporates both domain transformation and photometric consistency for the Monodepth2 deep learning architecture, improving the depth and ego-motion prediction of bronchoscopic video frames. Results drawing on well-registered test data illustrate that the proposed strategy achieves clear and precise prediction. In addition, effective reference scaling factors are summarized from the test dataset, enabling real-world applications, such as 3D surface reconstruction, camera trajectory generation, and fusion between CT and bronchoscopic video.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Semantic and Optical Flow Guided Self-supervised Monocular Depth and Ego-Motion Estimation
    Fang, Jiaojiao
    Liu, Guizhong
    [J]. IMAGE AND GRAPHICS (ICIG 2021), PT III, 2021, 12890 : 465 - 477
  • [2] Self-Supervised monocular depth and ego-Motion estimation in endoscopy: Appearance flow to the rescue
    Shao, Shuwei
    Pei, Zhongcai
    Chen, Weihai
    Zhu, Wentao
    Wu, Xingming
    Sun, Dianmin
    Zhang, Baochang
    [J]. MEDICAL IMAGE ANALYSIS, 2022, 77
  • [3] Self-Supervised Attention Learning for Depth and Ego-motion Estimation
    Sadek, Assent
    Chidlovskii, Boris
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10054 - 10060
  • [4] Self-Supervised Ego-Motion Estimation Based on Multi-Layer Fusion of RGB and Inferred Depth
    Jiang, Zijie
    Taira, Hajime
    Miyashita, Naoyuki
    Okutomi, Masatoshi
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 7605 - 7611
  • [5] Self-Supervised Depth and Ego-Motion Estimation for Monocular Thermal Video Using Multi-Spectral Consistency Loss
    Shin, Ukcheol
    Lee, Kyunghyun
    Lee, Seokju
    Kweon, In So
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 1103 - 1110
  • [6] WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters
    Lou, Ange
    Noble, Jack
    [J]. IMAGE-GUIDED PROCEDURES, ROBOTIC INTERVENTIONS, AND MODELING, MEDICAL IMAGING 2024, 2024, 12928
  • [7] Beyond Photometric Loss for Self-Supervised Ego-Motion Estimation
    Shen, Tianwei
    Luo, Zixin
    Zhou, Lei
    Deng, Hanyu
    Zhang, Runze
    Fang, Tian
    Quan, Long
    [J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 6359 - 6365
  • [8] Two Stream Networks for Self-Supervised Ego-Motion Estimation
    Ambrus, Rares
    Guizilini, Vitor
    Li, Jie
    Pillai, Sudeep
    Gaidon, Adrien
    [J]. CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [9] Joint self-supervised learning of interest point, descriptor, depth, and ego-motion from monocular video
    Wang, Zhongyi
    Shen, Mengjiao
    Chen, Qijun
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 77529 - 77547
  • [10] Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion
    Vasiljevic, Igor
    Guizilini, Vitor
    Ambrus, Rares
    Pillai, Sudeep
    Burgard, Wolfram
    Shakhnarovich, Greg
    Gaidon, Adrien
    [J]. 2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 1 - 11