Augmenting efficient real-time surgical instrument segmentation in video with point tracking and Segment Anything

被引:0
|
作者
Wu, Zijian [1 ]
Schmidt, Adam [1 ]
Kazanzides, Peter [2 ]
Salcudean, Septimiu E. [1 ]
机构
[1] Univ British Columbia, Dept Elect & Comp Engn, Robot & Control Lab, Vancouver, BC V6T 1Z4, Canada
[2] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD USA
基金
加拿大创新基金会;
关键词
medical robotics; robot vision; image segmentation; surgery; RECOGNITION;
D O I
10.1049/htl2.12111
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
The Segment Anything model (SAM) is a powerful vision foundation model that is revolutionizing the traditional paradigm of segmentation. Despite this, a reliance on prompting each frame and large computational cost limit its usage in robotically assisted surgery. Applications, such as augmented reality guidance, require little user intervention along with efficient inference to be usable clinically. This study addresses these limitations by adopting lightweight SAM variants to meet the efficiency requirement and employing fine-tuning techniques to enhance their generalization in surgical scenes. Recent advancements in tracking any point have shown promising results in both accuracy and efficiency, particularly when points are occluded or leave the field of view. Inspired by this progress, a novel framework is presented that combines an online point tracker with a lightweight SAM model that is fine-tuned for surgical instrument segmentation. Sparse points within the region of interest are tracked and used to prompt SAM throughout the video sequence, providing temporal consistency. The quantitative results surpass the state-of-the-art semi-supervised video object segmentation method XMem on the EndoVis 2015 dataset with 84.8 IoU and 91.0 Dice. The method achieves promising performance that is comparable to XMem and transformer-based fully supervised segmentation methods on ex vivo UCL dVRK and in vivo CholecSeg8k datasets. In addition, the proposed method shows promising zero-shot generalization ability on the label-free STIR dataset. In terms of efficiency, the method was tested on a single GeForce RTX 4060/4090 GPU respectively, achieving an over 25/90 FPS inference speed. Code is available at: .
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Robust real-time instrument tracking in ultrasound images
    Ortmaier, T
    Vitrani, MA
    Morel, G
    Pinault, S
    MEDICAL IMAGING 2005: ULTRASONIC IMAGING AND SIGNAL PROCESSING, 2005, 5750 : 170 - 177
  • [32] Real-time detection and tracking of light point
    Yang, Xuan
    Pei, Ji-Hong
    Yang, Wan-Hai
    2001, Chinese Optical Society (20):
  • [33] An efficient video segmentation algorithm for real-time MPEG-4 camera system
    Chien, SY
    Ma, SY
    Chen, LG
    VISUAL COMMUNICATIONS AND IMAGE PROCESSING 2000, PTS 1-3, 2000, 4067 : 1087 - 1098
  • [34] Real-time detection and tracking of light point
    Yang, X
    Pei, JH
    Yang, WH
    JOURNAL OF INFRARED AND MILLIMETER WAVES, 2001, 20 (04) : 279 - 282
  • [35] Efficient ConvNet for Real-time Semantic Segmentation
    Romera, Eduardo
    Alvarez, Jose M.
    Bergasa, Luis M.
    Arroyo, Roberto
    2017 28TH IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV 2017), 2017, : 1789 - 1794
  • [36] FaceSeg: Automatic Face Segmentation for Real-Time Video
    Li, Hongliang
    Ngan, King N.
    Liu, Qiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2009, 11 (01) : 77 - 88
  • [37] Real-time recursive motion segmentation of video data
    Wittebrood, R
    de Haan, G
    ICCE: 2001 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2001, : 288 - 289
  • [38] Automatic real-time capture and segmentation of endoscopy video
    Stanek, Sean R.
    Tavanapong, Wallapak
    Wong, Johnny S.
    Oh, JungHwan
    de Groen, Piet C.
    MEDICAL IMAGING 2008: PACS AND IMAGING INFORMATICS, 2008, 6919
  • [39] Real-Time Video Matting Based on Bilayer Segmentation
    Pham, Viet-Quoc
    Takahashi, Keita
    Naemura, Takeshi
    COMPUTER VISION - ACCV 2009, PT II, 2010, 5995 : 489 - +
  • [40] Real-Time Video Segmentation using a Single Click
    Sarath, S.
    Anamika, A. M.
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 448 - 451