PEg TRAnsfer Workflow recognition challenge report: Do multimodal data improve recognition?

被引:0
|
作者
Huaulme, Arnaud [1 ]
Harada, Kanako [2 ]
Nguyen, Quang-Minh [1 ]
Park, Bogyu [3 ]
Hong, Seungbum [3 ]
Choi, Min -Kook [3 ]
Peven, Michael [4 ]
Li, Yunshuang [5 ]
Long, Yonghao [6 ]
Dou, Qi [6 ]
Kumar, Satyadwyoom [7 ]
Lalithkumar, Seenivasan [8 ]
Hongliang, Ren [8 ,9 ]
Matsuzaki, Hiroki [10 ]
Ishikawa, Yuto [10 ]
Harai, Yuriko [10 ]
Kondo, Satoshi [11 ]
Mitsuishi, Manoru [2 ]
Jannin, Pierre [1 ]
机构
[1] Univ Rennes, INSERM, LTSI UMR 1099, F-35000 Rennes, France
[2] Univ Tokyo, Dept Mech Engn, Tokyo 1138656, Japan
[3] VisionAI hutom, Seoul, South Korea
[4] Johns Hopkins Univ, Baltimore, MD USA
[5] Zhejiang Univ, Hangzhou, Peoples R China
[6] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[7] Netaji Subhas Univ Technol, Delhi, India
[8] Natl Univ Singapore, Singapore, Singapore
[9] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[10] Japan East Hosp, Natl Canc Ctr, Tokyo 1040045, Japan
[11] Muroran Inst Technol, Hokkaido, Japan
关键词
Surgical process model; Workflow recognition; Multimodal; OR of the future; VIDEOS; TASKS;
D O I
10.1016/j.cmpb.2023.107561
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: In order to be context-aware, computer-assisted surgical systems require accu-rate, real-time automatic surgical workflow recognition. In the past several years, surgical video has been the most commonly-used modality for surgical workflow recognition. But with the democratization of robot-assisted surgery, new modalities, such as kinematics, are now accessible. Some previous methods use these new modalities as input for their models, but their added value has rarely been studied. This paper presents the design and results of the "PEg TRAnsfer Workflow recognition" (PETRAW) challenge with the objective of developing surgical workflow recognition methods based on one or more modalities and studying their added value. Methods: The PETRAW challenge included a data set of 150 peg transfer sequences performed on a vir-tual simulator. This data set included videos, kinematic data, semantic segmentation data, and annota-tions, which described the workflow at three levels of granularity: phase, step, and activity. Five tasks were proposed to the participants: three were related to the recognition at all granularities simultane-ously using a single modality, and two addressed the recognition using multiple modalities. The mean application-dependent balanced accuracy (AD-Accuracy) was used as an evaluation metric to take into account class balance and is more clinically relevant than a frame-by-frame score.Results: Seven teams participated in at least one task with four participating in every task. The best results were obtained by combining video and kinematic data (AD-Accuracy of between 93% and 90% for the four teams that participated in all tasks). Conclusion: The improvement of surgical workflow recognition methods using multiple modalities com-pared with unimodal methods was significant for all teams. However, the longer execution time required for video/kinematic-based methods(compared to only kinematic-based methods) must be considered. In-deed, one must ask if it is wise to increase computing time by 20 0 0 to 20,0 0 0% only to increase accuracy by 3%. The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further re-search in surgical workflow recognition.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] MIcro-surgical anastomose workflow recognition challenge report
    Huaulme, Arnaud
    Sarikaya, Duygu
    Le Mut, Kevin
    Despinoy, Fabien
    Long, Yonghao
    Dou, Qi
    Chng, Chin-Boon
    Lin, Wenjun
    Kondo, Satoshi
    Bravo-Sanchez, Laura
    Arbelaez, Pablo
    Reiter, Wolfgang
    Mitsuishi, Manoru
    Harada, Kanako
    Jannin, Pierre
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2021, 212 (212)
  • [2] Multimodal Cues Do Not Improve Predator Recognition in Green Toad Tadpoles
    Gazzola, Andrea
    Guadin, Bianca
    Balestrieri, Alessandro
    Pellitteri-Rosa, Daniele
    ANIMALS, 2022, 12 (19):
  • [3] Robust face recognition using multimodal data and transfer learning
    Srivastava, Akhilesh Mohan
    Chintaginjala, Sai Dinesh
    Bhogavalli, Samhit Chowdary
    Prakash, Surya
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (04)
  • [4] Multimodal Emotion Recognition for AVEC 2016 Challenge
    Povolny, Filip
    Matejka, Pavel
    Hradis, Michal
    Popkova, Anna
    Otrusina, Lubomir
    Smrz, Pavel
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16), 2016, : 75 - 81
  • [5] MEC 2017: Multimodal Emotion Recognition Challenge
    Li, Ya
    Tao, Jianhua
    Schuller, Bjoern
    Shan, Shiguang
    Jiang, Dongmei
    Jia, Jia
    2018 FIRST ASIAN CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII ASIA), 2018,
  • [6] Multimodal Emotion Recognition Using Transfer Learning on Audio and Text Data
    Deng, James J.
    Leung, Clement H. C.
    Li, Yuanxi
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2021, PT III, 2021, 12951 : 552 - 563
  • [7] Speech recognition:: impact on workflow and report availability
    Glaser, C
    Trumm, C
    Nissen-Meyer, S
    Francke, M
    Küttner, B
    Reiser, M
    RADIOLOGE, 2005, 45 (08): : 735 - +
  • [8] Multimodal data fusion for object recognition
    Knyaz, Vladimir
    MULTIMODAL SENSING: TECHNOLOGIES AND APPLICATIONS, 2019, 11059
  • [9] Data hiding for multimodal biometric recognition
    Giannoula, A
    Hatzinakos, D
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 2, PROCEEDINGS, 2004, : 165 - 168
  • [10] Situation Recognition from Multimodal Data
    Singh, Vivek K.
    Pongpaichet, Siripen
    Jain, Ramesh
    MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 1475 - 1476