Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition

被引:6
|
作者
Sharma, Saurav [1 ]
Nwoye, Chinedu Innocent [1 ]
Mutter, Didier [2 ,3 ]
Padoy, Nicolas [1 ,2 ]
机构
[1] Univ Strasbourg, ICube, CNRS, Strasbourg, France
[2] IHU Strasbourg, Strasbourg, France
[3] Univ Hosp Strasbourg, Strasbourg, France
关键词
Surgical triplet recognition; Laparoscopic surgery; Temporal modeling; Action triplet; Attention model;
D O I
10.1007/s11548-023-02914-1
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Purpose One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target). Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos.Methods In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition.Results We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as (instrument, verb). Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts.Conclusion We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
引用
收藏
页码:1053 / 1059
页数:7
相关论文
共 50 条
  • [1] Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition
    Saurav Sharma
    Chinedu Innocent Nwoye
    Didier Mutter
    Nicolas Padoy
    [J]. International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 1053 - 1059
  • [2] A Neural Autoregressive Approach to Attention-based Recognition
    Zheng, Yin
    Zemel, Richard S.
    Zhang, Yu-Jin
    Larochelle, Hugo
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 113 (01) : 67 - 79
  • [3] A Neural Autoregressive Approach to Attention-based Recognition
    Yin Zheng
    Richard S. Zemel
    Yu-Jin Zhang
    Hugo Larochelle
    [J]. International Journal of Computer Vision, 2015, 113 : 67 - 79
  • [4] ATSN: Attention-Based Temporal Segment Network for Action Recognition
    Sun, Yun-lei
    Zhang, Da-lin
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2019, 26 (06): : 1664 - 1669
  • [5] Recurrent Temporal Sparse Autoencoder for Attention-based Action Recognition
    Xin, Miao
    Zhang, Hong
    Sun, Mingui
    Yuan, Ding
    [J]. 2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 456 - 463
  • [6] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [7] Attention-Based Temporal Weighted Convolutional Neural Network for Action Recognition
    Zang, Jinliang
    Wang, Le
    Liu, Ziyi
    Zhang, Qilin
    Niu, Zhenxing
    Hua, Gang
    Zheng, Nanning
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 97 - 108
  • [8] A Federated Attention-Based Multimodal Biometric Recognition Approach in IoT
    Lin, Leyu
    Zhao, Yue
    Meng, Jintao
    Zhao, Qi
    [J]. SENSORS, 2023, 23 (13)
  • [9] Real-time emotion recognition using end-to-end attention-based fusion network
    Shit, Sahadeb
    Rana, Aiswarya
    Das, Dibyendu Kumar
    Ray, Dip Narayan
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [10] An Attention-Based Approach for Mongolian News Named Entity Recognition
    Tan, Mingyan
    Bao, Feilong
    Gao, Guanglai
    Wang, Weihua
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2019, 2019, 11856 : 424 - 435