TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos

被引:2
|
作者
Ramesh, Sanat [1 ,2 ]
Dall'Alba, Diego [1 ]
Gonzalez, Cristians [3 ,5 ]
Yu, Tong [2 ]
Mascagni, Pietro [5 ,6 ]
Mutter, Didier [3 ,4 ,5 ]
Marescaux, Jacques [4 ]
Fiorini, Paolo [1 ]
Padoy, Nicolas [2 ,5 ]
机构
[1] Univ Verona, Altair Robot Lab, I-37134 Verona, Italy
[2] Univ Strasbourg, CNRS, ICube, F-67000 Strasbourg, France
[3] Univ Hosp Strasbourg, F-67000 Strasbourg, France
[4] IRCAD, F-67000 Strasbourg, France
[5] IHU Strasbourg, Inst Image Guided Surg, F-67000 Strasbourg, France
[6] Fdn Policlin Univ Agostino Gemelli IRCCS, I-00168 Rome, Italy
关键词
Data augmentation; Temporal augmentation; Surgical activity recognition; Temporal convolutional networks; Gastric bypass procedures; Cataract procedures;
D O I
10.1007/s11548-023-02864-8
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
PurposeAutomatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities.MethodsThis work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN.ResultsThe effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations.ConclusionThis work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
引用
收藏
页码:1665 / 1672
页数:8
相关论文
共 50 条
  • [21] Research on human activity detection from videos based on pattern recognition
    Zhu, S. P.
    Liu, C.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 37 - 38
  • [22] Recognition and Classification of Human Activity from RGB-D Videos
    Gurkaynak, Deniz
    Yalcin, Hulya
    2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 1745 - 1748
  • [23] First-Person Animal Activity Recognition from Egocentric Videos
    Iwashita, Yumi
    Takamine, Asamichi
    Kurazume, Ryo
    Ryoo, M. S.
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4310 - 4315
  • [24] Human emotion recognition from videos using spatio-temporal and audio features
    Munaf Rashid
    S. A. R. Abu-Bakar
    Musa Mokji
    The Visual Computer, 2013, 29 : 1269 - 1275
  • [25] Human emotion recognition from videos using spatio-temporal and audio features
    Rashid, Munaf
    Abu-Bakar, S. A. R.
    Mokji, Musa
    VISUAL COMPUTER, 2013, 29 (12): : 1269 - 1275
  • [26] Exploring Segment-Level Semantics for Online Phase Recognition From Surgical Videos
    Ding, Xinpeng
    Li, Xiaomeng
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (11) : 3309 - 3319
  • [27] ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos
    Zou, Xiaoyang
    Liu, Wenyong
    Wang, Junchen
    Tao, Rong
    Zheng, Guoyan
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING-IMAGING AND VISUALIZATION, 2023, 11 (04): : 1012 - 1018
  • [28] Suspicious Human Activity Recognition From Surveillance Videos Using Deep Learning
    Mohamed Zaidi, Monji
    Avelino Sampedro, Gabriel
    Almadhor, Ahmad
    Alsubai, Shtwai
    Al Hejaili, Abdullah
    Gregus, Michal
    Abbas, Sidra
    IEEE ACCESS, 2024, 12 : 105497 - 105510
  • [29] Abnormal Activity Recognition from Surveillance Videos Using Convolutional Neural Network
    Habib, Shabana
    Hussain, Altaf
    Albattah, Waleed
    Islam, Muhammad
    Khan, Sheroz
    Khan, Rehan Ullah
    Khan, Khalil
    SENSORS, 2021, 21 (24)
  • [30] Human Activity Prediction: Early Recognition of Ongoing Activities from Streaming Videos
    Ryoo, M. S.
    2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1036 - 1043