PEg TRAnsfer Workflow recognition challenge report: Do multimodal data improve recognition?

被引:0
|
作者
Huaulme, Arnaud [1 ]
Harada, Kanako [2 ]
Nguyen, Quang-Minh [1 ]
Park, Bogyu [3 ]
Hong, Seungbum [3 ]
Choi, Min -Kook [3 ]
Peven, Michael [4 ]
Li, Yunshuang [5 ]
Long, Yonghao [6 ]
Dou, Qi [6 ]
Kumar, Satyadwyoom [7 ]
Lalithkumar, Seenivasan [8 ]
Hongliang, Ren [8 ,9 ]
Matsuzaki, Hiroki [10 ]
Ishikawa, Yuto [10 ]
Harai, Yuriko [10 ]
Kondo, Satoshi [11 ]
Mitsuishi, Manoru [2 ]
Jannin, Pierre [1 ]
机构
[1] Univ Rennes, INSERM, LTSI UMR 1099, F-35000 Rennes, France
[2] Univ Tokyo, Dept Mech Engn, Tokyo 1138656, Japan
[3] VisionAI hutom, Seoul, South Korea
[4] Johns Hopkins Univ, Baltimore, MD USA
[5] Zhejiang Univ, Hangzhou, Peoples R China
[6] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[7] Netaji Subhas Univ Technol, Delhi, India
[8] Natl Univ Singapore, Singapore, Singapore
[9] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[10] Japan East Hosp, Natl Canc Ctr, Tokyo 1040045, Japan
[11] Muroran Inst Technol, Hokkaido, Japan
关键词
Surgical process model; Workflow recognition; Multimodal; OR of the future; VIDEOS; TASKS;
D O I
10.1016/j.cmpb.2023.107561
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: In order to be context-aware, computer-assisted surgical systems require accu-rate, real-time automatic surgical workflow recognition. In the past several years, surgical video has been the most commonly-used modality for surgical workflow recognition. But with the democratization of robot-assisted surgery, new modalities, such as kinematics, are now accessible. Some previous methods use these new modalities as input for their models, but their added value has rarely been studied. This paper presents the design and results of the "PEg TRAnsfer Workflow recognition" (PETRAW) challenge with the objective of developing surgical workflow recognition methods based on one or more modalities and studying their added value. Methods: The PETRAW challenge included a data set of 150 peg transfer sequences performed on a vir-tual simulator. This data set included videos, kinematic data, semantic segmentation data, and annota-tions, which described the workflow at three levels of granularity: phase, step, and activity. Five tasks were proposed to the participants: three were related to the recognition at all granularities simultane-ously using a single modality, and two addressed the recognition using multiple modalities. The mean application-dependent balanced accuracy (AD-Accuracy) was used as an evaluation metric to take into account class balance and is more clinically relevant than a frame-by-frame score.Results: Seven teams participated in at least one task with four participating in every task. The best results were obtained by combining video and kinematic data (AD-Accuracy of between 93% and 90% for the four teams that participated in all tasks). Conclusion: The improvement of surgical workflow recognition methods using multiple modalities com-pared with unimodal methods was significant for all teams. However, the longer execution time required for video/kinematic-based methods(compared to only kinematic-based methods) must be considered. In-deed, one must ask if it is wise to increase computing time by 20 0 0 to 20,0 0 0% only to increase accuracy by 3%. The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further re-search in surgical workflow recognition.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] System for multimodal data acquisition for human action recognition
    Filip Malawski
    Jakub Gałka
    Multimedia Tools and Applications, 2018, 77 : 23825 - 23850
  • [32] Sign Language Recognition Analysis using Multimodal Data
    Hosain, Al Amin
    Santhalingam, Panneer Selvam
    Pathak, Parth
    Kosecka, Jana
    Rangwala, Huzefa
    2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 203 - 210
  • [33] Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning
    Luna-Jimenez, Cristina
    Griol, David
    Callejas, Zoraida
    Kleinlein, Ricardo
    Montero, Juan M.
    Fernandez-Martinez, Fernando
    SENSORS, 2021, 21 (22)
  • [34] Masked Face Recognition Challenge: The InsightFace Track Report
    Deng, Jiankang
    Guo, Jia
    An, Xiang
    Zhu, Zheng
    Zafeiriou, Stefanos
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 1437 - 1444
  • [35] Transfer Learning to improve Arabic handwriting text Recognition
    Noubigh, Zouhaira
    Mezghani, Anis
    Kherallah, Monji
    2020 21ST INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2020,
  • [36] Spracherkennung: Auswirkung auf Workflow und BefundverfügbarkeitSpeech recognition: impact on workflow and report availability
    C. Glaser
    C. Trumm
    S. Nissen-Meyer
    M. Francke
    B. Küttner
    M. Reiser
    Der Radiologe, 2005, 45 (8): : 735 - 742
  • [37] Tackling the ADReSS challenge: a multimodal approach to the automated recognition of Alzheimer's dementia
    Martinc, Matej
    Pollak, Senja
    INTERSPEECH 2020, 2020, : 2157 - 2161
  • [38] The MuSe 2024 Multimodal Sentiment Analysis Challenge: Social Perception and Humor Recognition
    Amiriparian, Shahin
    Christ, Lukas
    Kathan, Alexander
    Gerczuk, Maurice
    Mueller, Niklas
    Klug, Steffen
    Stappen, Lukas
    Koenig, Andreas
    Cambria, Erik
    Schuller, Bjoern W.
    Eulitz, Simone
    PROCEEDINGS OF THE 5TH MULTIMODAL SENTIMENT ANALYSIS CHALLENGE AND WORKSHOP: SOCIAL PERCEPTION AND HUMOR, MUSE 2024, 2024, : 1 - 9
  • [39] ChAirGest - A Challenge for Multimodal Mid-Air Gesture Recognition for Close HCI
    Ruffieux, Simon
    Lalanne, Denis
    Mugellini, Elena
    ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 483 - 488
  • [40] Shape-from-recognition: Recognition enables meta-data transfer
    Thomas, Alexander
    Ferrari, Vittorio
    Leibe, Bastian
    Tuytelaars, Tinne
    Van Gool, Luc
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2009, 113 (12) : 1222 - 1234