Global-local multi-stage temporal convolutional network for cataract surgery phase recognition

被引:1
|
作者
Fang, Lixin [1 ,2 ]
Mou, Lei [2 ]
Gu, Yuanyuan [2 ,8 ]
Hu, Yan [3 ]
Chen, Bang [2 ]
Chen, Xu [4 ,5 ,6 ,7 ]
Wang, Yang [9 ]
Liu, Jiang [3 ]
Zhao, Yitian [2 ,8 ]
机构
[1] Zhejiang Univ Technol, Coll Mech Engn, Hangzhou 310014, Peoples R China
[2] Chinese Acad Sci, Cixi Inst Biomed Engn, Ningbo Inst Mat Technol & Engn, Ningbo, Peoples R China
[3] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
[4] Shanghai Aier Eye Hosp, Dept Ophthalmol, Shanghai, Peoples R China
[5] Shanghai Aier Qingliang Eye Hosp, Dept Ophthalmol, Shanghai, Peoples R China
[6] Jinan Univ, Aier Eye Hosp, 601 Huangpu Rd West, Guangzhou, Peoples R China
[7] Cent South Univ Changsha, Aier Sch Ophthalmol, Changsha, Hunan, Peoples R China
[8] Chinese Acad Sci, Zhejiang Engn Res Ctr Biomed Mat, Cixi Inst Biomed Engn, Ningbo Inst Mat Technol & Engn, Ningbo 315300, Peoples R China
[9] Chinese Acad Sci, Aerosp Informat Res Inst, Beijing, Peoples R China
关键词
Surgical phase recognition; Temporal convolutional networks; Cataract surgery videos; Deep learning; SEGMENTATION; WORKFLOW; VIDEOS; TASKS;
D O I
10.1186/s12938-022-01048-w
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Background: Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. Methods: In this paper, we introduce a global-local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. Results: Our method is thoroughly validated on the CS Video dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. Conclusions: The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Global–local multi-stage temporal convolutional network for cataract surgery phase recognition
    Lixin Fang
    Lei Mou
    Yuanyuan Gu
    Yan Hu
    Bang Chen
    Xu Chen
    Yang Wang
    Jiang Liu
    Yitian Zhao
    [J]. BioMedical Engineering OnLine, 21
  • [2] Multi-Stage Temporal Convolutional Network with Moment Loss and Positional Encoding for Surgical Phase Recognition
    Park, Minyoung
    Oh, Seungtaek
    Jeong, Taikyeong
    Yu, Sungwook
    [J]. DIAGNOSTICS, 2023, 13 (01)
  • [3] Global-Local Temporal Convolutional Network for Traffic Flow Prediction
    Ren, Yajie
    Zhao, Dong
    Luo, Dan
    Ma, Huadong
    Duan, Pengrui
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) : 1578 - 1584
  • [4] Action recognition based on multi-stage jointly training convolutional network
    Hanling Zhang
    Chenxing Xia
    Xiuju Gao
    [J]. Multimedia Tools and Applications, 2019, 78 : 9919 - 9931
  • [5] Action recognition based on multi-stage jointly training convolutional network
    Zhang, Hanling
    Xia, Chenxing
    Gao, Xiuju
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (08) : 9919 - 9931
  • [6] Multi-Relation Extraction via A Global-Local Graph Convolutional Network
    Cheng, Harry
    Liao, Lizi
    Hu, Linmei
    Nie, Liqiang
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (06) : 1716 - 1728
  • [7] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation
    Abu Farha, Yazan
    Gall, Juergen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3570 - 3579
  • [8] Neighbor Correlated Graph Convolutional Network for multi-stage malaria parasite recognition
    Meng, Xiangjie
    Ha, Yan
    Tian, Junfeng
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (08) : 11393 - 11414
  • [9] Facial expression recognition based on a multi-task global-local network
    Yu, Mingjing
    Zheng, Huicheng
    Peng, Zhifeng
    Dong, Jiayu
    Du, Heran
    [J]. PATTERN RECOGNITION LETTERS, 2020, 131 : 166 - 171
  • [10] Neighbor Correlated Graph Convolutional Network for multi-stage malaria parasite recognition
    Xiangjie Meng
    Yan Ha
    Junfeng Tian
    [J]. Multimedia Tools and Applications, 2022, 81 : 11393 - 11414