Rendezvous in time: an attention-based temporal fusion approach for surgical triplet recognition

被引:6
|
作者
Sharma, Saurav [1 ]
Nwoye, Chinedu Innocent [1 ]
Mutter, Didier [2 ,3 ]
Padoy, Nicolas [1 ,2 ]
机构
[1] Univ Strasbourg, ICube, CNRS, Strasbourg, France
[2] IHU Strasbourg, Strasbourg, France
[3] Univ Hosp Strasbourg, Strasbourg, France
关键词
Surgical triplet recognition; Laparoscopic surgery; Temporal modeling; Action triplet; Attention model;
D O I
10.1007/s11548-023-02914-1
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Purpose One of the recent advances in surgical AI is the recognition of surgical activities as triplets of (instrument, verb, target). Albeit providing detailed information for computer-assisted intervention, current triplet recognition approaches rely only on single-frame features. Exploiting the temporal cues from earlier frames would improve the recognition of surgical action triplets from videos.Methods In this paper, we propose Rendezvous in Time (RiT)-a deep learning model that extends the state-of-the-art model, Rendezvous, with temporal modeling. Focusing more on the verbs, our RiT explores the connectedness of current and past frames to learn temporal attention-based features for enhanced triplet recognition.Results We validate our proposal on the challenging surgical triplet dataset, CholecT45, demonstrating an improved recognition of the verb and triplet along with other interactions involving the verb such as (instrument, verb). Qualitative results show that the RiT produces smoother predictions for most triplet instances than the state-of-the-arts.Conclusion We present a novel attention-based approach that leverages the temporal fusion of video frames to model the evolution of surgical actions and exploit their benefits for surgical triplet recognition.
引用
下载
收藏
页码:1053 / 1059
页数:7
相关论文
共 50 条
  • [21] Learning Active Fusion of Multiple Experts' Decisions: An Attention-Based Approach
    Mirian, Maryam S.
    Ahmadabadi, Majid Nili
    Araabi, Babak N.
    Siegwart, Roland R.
    NEURAL COMPUTATION, 2011, 23 (02) : 558 - 591
  • [22] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [23] Attention-based Text Recognition in the Wild
    Yan, Zhi-Chen
    Yu, Stephanie A.
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON DEEP LEARNING THEORY AND APPLICATIONS (DELTA), 2020, : 42 - 49
  • [24] ATTENTION-BASED PARTIAL FACE RECOGNITION
    Hoermann, Stefan
    Zhang, Zeyuan
    Knoche, Martin
    Teepe, Torben
    Rigoll, Gerhard
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2978 - 2982
  • [25] An Attention-Based Approach for Chemical Compound and Drug Named Entity Recognition
    Yang P.
    Yang Z.
    Luo L.
    Lin H.
    Wang J.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (07): : 1548 - 1556
  • [26] AMFF: A new attention-based multi-feature fusion method for intention recognition
    Liu, Cong
    Xu, Xiaolong
    KNOWLEDGE-BASED SYSTEMS, 2021, 233
  • [27] Attention-based interactive multi-level feature fusion for named entity recognition
    Yiwu Xu
    Yun Chen
    Scientific Reports, 15 (1)
  • [28] Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition
    Fan, Zhaoxuan
    Zhao, Xu
    Lin, Tianwei
    Su, Haisheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) : 363 - 374
  • [29] Residual Attention-based Fusion for Video Classification
    Pouyanfar, Samira
    Wang, Tianyi
    Chen, Shu-Ching
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 478 - 480
  • [30] Attention-based spatial-temporal hierarchical ConvLSTM network for action recognition in videos
    Xue, Fei
    Ji, Hongbing
    Zhang, Wenbo
    Cao, Yi
    IET COMPUTER VISION, 2019, 13 (08) : 708 - 718