TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos

被引:2
|
作者
Ramesh, Sanat [1 ,2 ]
Dall'Alba, Diego [1 ]
Gonzalez, Cristians [3 ,5 ]
Yu, Tong [2 ]
Mascagni, Pietro [5 ,6 ]
Mutter, Didier [3 ,4 ,5 ]
Marescaux, Jacques [4 ]
Fiorini, Paolo [1 ]
Padoy, Nicolas [2 ,5 ]
机构
[1] Univ Verona, Altair Robot Lab, I-37134 Verona, Italy
[2] Univ Strasbourg, CNRS, ICube, F-67000 Strasbourg, France
[3] Univ Hosp Strasbourg, F-67000 Strasbourg, France
[4] IRCAD, F-67000 Strasbourg, France
[5] IHU Strasbourg, Inst Image Guided Surg, F-67000 Strasbourg, France
[6] Fdn Policlin Univ Agostino Gemelli IRCCS, I-00168 Rome, Italy
关键词
Data augmentation; Temporal augmentation; Surgical activity recognition; Temporal convolutional networks; Gastric bypass procedures; Cataract procedures;
D O I
10.1007/s11548-023-02864-8
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
PurposeAutomatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities.MethodsThis work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN.ResultsThe effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations.ConclusionThis work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
引用
收藏
页码:1665 / 1672
页数:8
相关论文
共 50 条
  • [1] TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos
    Sanat Ramesh
    Diego Dall’Alba
    Cristians Gonzalez
    Tong Yu
    Pietro Mascagni
    Didier Mutter
    Jacques Marescaux
    Paolo Fiorini
    Nicolas Padoy
    International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 1665 - 1672
  • [2] An Empirical Study on Activity Recognition in Long Surgical Videos
    He, Zhuohong
    Mottaghi, Ali
    Sharghi, Aidean
    Jamal, Muhammad Abdullah
    Mohareri, Omid
    MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 356 - 372
  • [3] HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos
    Amer, Mohamed Rabie
    Lei, Peng
    Todorovic, Sinisa
    COMPUTER VISION - ECCV 2014, PT VI, 2014, 8694 : 572 - 585
  • [4] HARTIV: Human Activity Recognition Using Temporal Information in Videos
    Deotale, Disha
    Verma, Madhushi
    Suresh, P.
    Jangir, Sunil Kumar
    Kaur, Manjit
    Idris, Sahar Ahmed
    Alshazly, Hammam
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 70 (02): : 3919 - 3938
  • [5] PhacoTrainer: Deep Learning for Activity Recognition in Cataract Surgical Videos
    Yeh, Hsu-Hang
    Jain, Anjal
    Fox, Olivia
    Wang, Sophia
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2021, 62 (08)
  • [6] Activity Recognition From Newborn Resuscitation Videos
    Meinich-Bache, Oyvind
    Austnes, Simon Lennart
    Engan, Kjersti
    Austvoll, Ivar
    Eftestol, Trygve
    Myklebust, Helge
    Kusulla, Simeon
    Kidanto, Hussein
    Ersdal, Hege
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (11) : 3258 - 3267
  • [7] A Survey on Human Activity Recognition from Videos
    Subetha, T.
    Chitrakala, S.
    2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [8] Spatio-Temporal Activity Detection and Recognition in Untrimmed Surveillance Videos
    Gkountakos, Konstantinos
    Touska, Despoina
    Ioannidis, Konstantinos
    Tsikrika, Theodora
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 451 - 455
  • [9] Action Recognition from Egocentric Videos Using Random Walks
    Sahu, Abhimanyu
    Bhattacharya, Rajit
    Bhura, Pallabh
    Chowdhury, Ananda S.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2018, VOL 2, 2020, 1024 : 389 - 402
  • [10] DT-3DResNet-LSTM: An Architecture for Temporal Activity Recognition in Videos
    Yao, Li
    Qian, Ying
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 622 - 632