TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos

被引:2
|
作者
Ramesh, Sanat [1 ,2 ]
Dall'Alba, Diego [1 ]
Gonzalez, Cristians [3 ,5 ]
Yu, Tong [2 ]
Mascagni, Pietro [5 ,6 ]
Mutter, Didier [3 ,4 ,5 ]
Marescaux, Jacques [4 ]
Fiorini, Paolo [1 ]
Padoy, Nicolas [2 ,5 ]
机构
[1] Univ Verona, Altair Robot Lab, I-37134 Verona, Italy
[2] Univ Strasbourg, CNRS, ICube, F-67000 Strasbourg, France
[3] Univ Hosp Strasbourg, F-67000 Strasbourg, France
[4] IRCAD, F-67000 Strasbourg, France
[5] IHU Strasbourg, Inst Image Guided Surg, F-67000 Strasbourg, France
[6] Fdn Policlin Univ Agostino Gemelli IRCCS, I-00168 Rome, Italy
关键词
Data augmentation; Temporal augmentation; Surgical activity recognition; Temporal convolutional networks; Gastric bypass procedures; Cataract procedures;
D O I
10.1007/s11548-023-02864-8
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
PurposeAutomatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities.MethodsThis work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN.ResultsThe effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations.ConclusionThis work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
引用
收藏
页码:1665 / 1672
页数:8
相关论文
共 50 条
  • [31] Semi-Supervised Action Recognition From Temporal Augmentation Using Curriculum Learning
    Tong, Anyang
    Tang, Chao
    Wang, Wenjian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1305 - 1319
  • [32] Temporal Memory Relation Network for Workflow Recognition From Surgical Video
    Jin, Yueming
    Long, Yonghao
    Chen, Cheng
    Zhao, Zixu
    Dou, Qi
    Heng, Pheng-Ann
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (07) : 1911 - 1923
  • [33] SV-RCNet: Workflow Recognition From Surgical Videos Using Recurrent Convolutional Network
    Jin, Yueming
    Dou, Qi
    Chen, Hao
    Yu, Lequan
    Qin, Jing
    Fu, Chi-Wing
    Heng, Pheng-Ann
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2018, 37 (05) : 1114 - 1126
  • [34] EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
    Fujii, Ryo
    Hatano, Masashi
    Saito, Hideo
    Kajita, Hiroki
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VI, 2024, 15006 : 187 - 196
  • [35] EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
    Keio University, Yokohama, Kanagawa, Japan
    不详
    arXiv,
  • [36] Weakly Supervised Temporal Convolutional Networks for Fine-Grained Surgical Activity Recognition
    Ramesh, Sanat
    Dall'Alba, Diego
    Gonzalez, Cristians
    Yu, Tong
    Mascagni, Pietro
    Mutter, Didier
    Marescaux, Jacques
    Fiorini, Paolo
    Padoy, Nicolas
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (09) : 2592 - 2602
  • [37] Discriminative temporal smoothing for activity recognition from wearable sensors
    Suutala, Jaakko
    Pirttikangas, Susanna
    Roning, Juha
    UBIQUITOUS COMPUTING SYSTEMS, PROCEEDINGS, 2007, 4836 : 182 - 195
  • [38] Individual Action and Group Activity Recognition in Soccer Videos from a Static Panoramic Camera
    Gerats, Beerend
    Bouma, Henri
    Uijens, Wouter
    Englebienne, Gwenn
    Spreeuwers, Luuk
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 594 - 601
  • [39] A New Hybrid Architecture for Human Activity Recognition from RGB-D Videos
    Das, Srijan
    Thonnat, Monique
    Sakhalkar, Kaustubh
    Koperski, Michal
    Bremond, Francois
    Francesca, Gianpiero
    MULTIMEDIA MODELING, MMM 2019, PT II, 2019, 11296 : 493 - 505
  • [40] Human Activity Recognition from Automatically Labeled Data in RGB-D Videos
    Jardim, David
    Nunes, Luis
    Dias, Miguel
    2016 8TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2016, : 89 - 94