TRandAugment: temporal random augmentation strategy for surgical activity recognition from videos

被引:2
|
作者
Ramesh, Sanat [1 ,2 ]
Dall'Alba, Diego [1 ]
Gonzalez, Cristians [3 ,5 ]
Yu, Tong [2 ]
Mascagni, Pietro [5 ,6 ]
Mutter, Didier [3 ,4 ,5 ]
Marescaux, Jacques [4 ]
Fiorini, Paolo [1 ]
Padoy, Nicolas [2 ,5 ]
机构
[1] Univ Verona, Altair Robot Lab, I-37134 Verona, Italy
[2] Univ Strasbourg, CNRS, ICube, F-67000 Strasbourg, France
[3] Univ Hosp Strasbourg, F-67000 Strasbourg, France
[4] IRCAD, F-67000 Strasbourg, France
[5] IHU Strasbourg, Inst Image Guided Surg, F-67000 Strasbourg, France
[6] Fdn Policlin Univ Agostino Gemelli IRCCS, I-00168 Rome, Italy
关键词
Data augmentation; Temporal augmentation; Surgical activity recognition; Temporal convolutional networks; Gastric bypass procedures; Cataract procedures;
D O I
10.1007/s11548-023-02864-8
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
PurposeAutomatic recognition of surgical activities from intraoperative surgical videos is crucial for developing intelligent support systems for computer-assisted interventions. Current state-of-the-art recognition methods are based on deep learning where data augmentation has shown the potential to improve the generalization of these methods. This has spurred work on automated and simplified augmentation strategies for image classification and object detection on datasets of still images. Extending such augmentation methods to videos is not straightforward, as the temporal dimension needs to be considered. Furthermore, surgical videos pose additional challenges as they are composed of multiple, interconnected, and long-duration activities.MethodsThis work proposes a new simplified augmentation method, called TRandAugment, specifically designed for long surgical videos, that treats each video as an assemble of temporal segments and applies consistent but random transformations to each segment. The proposed augmentation method is used to train an end-to-end spatiotemporal model consisting of a CNN (ResNet50) followed by a TCN.ResultsThe effectiveness of the proposed method is demonstrated on two surgical video datasets, namely Bypass40 and CATARACTS, and two tasks, surgical phase and step recognition. TRandAugment adds a performance boost of 1-6% over previous state-of-the-art methods, that uses manually designed augmentations.ConclusionThis work presents a simplified and automated augmentation method for long surgical videos. The proposed method has been validated on different datasets and tasks indicating the importance of devising temporal augmentation methods for long surgical videos.
引用
收藏
页码:1665 / 1672
页数:8
相关论文
共 50 条
  • [41] Automated excavators activity recognition and productivity analysis from construction site surveillance videos
    Chen, Chen
    Zhu, Zhenhua
    Hammad, Amin
    AUTOMATION IN CONSTRUCTION, 2020, 110
  • [42] Social Relation Recognition from Videos via Multi-scale Spatial-Temporal Reasoning
    Liu, Xinchen
    Liu, Wu
    Zhang, Meng
    Chen, Jingwen
    Gao, Lianli
    Yan, Chenggang
    Mei, Tao
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3561 - 3569
  • [43] A spatio-temporal recurrent network for salmon feeding action recognition from underwater videos in aquaculture
    Maloy, Hakon
    Aamodt, Agnar
    Misimi, Ekrem
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 167
  • [44] Sensor Data Augmentation from Skeleton Pose Sequences for Improving Human Activity Recognition
    Zolfaghari, Parham
    Rey, Vitor Fortes
    Ray, Lala
    Kim, Hyun
    Suh, Sungho
    Lukowicz, Paul
    2024 INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING, ABC 2024, 2024,
  • [45] Human activity recognition from uav videos using an optimized hybrid deep learning model
    Sinha, Kumari Priyanka
    Kumar, Prabhat
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (17) : 51669 - 51698
  • [46] Human activity recognition from uav videos using an optimized hybrid deep learning model
    Kumari Priyanka Sinha
    Prabhat Kumar
    Multimedia Tools and Applications, 2024, 83 : 51669 - 51698
  • [47] Activity Recognition from Video Data using Spatial and Temporal Features
    Al-Wattar, Mohamad
    Khusainov, Rinat
    Azzi, Djamel
    Chiverton, John
    12TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS - IE 2016, 2016, : 250 - 253
  • [48] Automatic Interaction and Activity Recognition from Videos of Human Manual Demonstrations with Application to Anomaly Detection
    Merlo, Elena
    Lagomarsino, Marta
    Lamon, Edoardo
    Ajoudani, Arash
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1188 - 1195
  • [49] Human activity recognition from UAV videos using a novel DMLC-CNN model
    Sinha, Kumari Priyanka
    Kumar, Prabhat
    IMAGE AND VISION COMPUTING, 2023, 134
  • [50] A Combination of Generative and Discriminative Models for Fast Unsupervised Activity Recognition from Traffic Scene Videos
    Krishna, Mahesh Venkata
    Denzler, Joachim
    2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 640 - 645