PEg TRAnsfer Workflow recognition challenge report: Do multimodal data improve recognition?

被引：0

作者：

Huaulme, Arnaud ^{[1
]}

Harada, Kanako ^{[2
]}

Nguyen, Quang-Minh ^{[1
]}

Park, Bogyu ^{[3
]}

Hong, Seungbum ^{[3
]}

Choi, Min -Kook ^{[3
]}

Peven, Michael ^{[4
]}

Li, Yunshuang ^{[5
]}

Long, Yonghao ^{[6
]}

Dou, Qi ^{[6
]}

Kumar, Satyadwyoom ^{[7
]}

Lalithkumar, Seenivasan ^{[8
]}

Hongliang, Ren ^{[8
,9
]}

Matsuzaki, Hiroki ^{[10
]}

Ishikawa, Yuto ^{[10
]}

Harai, Yuriko ^{[10
]}

Kondo, Satoshi ^{[11
]}

Mitsuishi, Manoru ^{[2
]}

Jannin, Pierre ^{[1
]}

机构：

[1] Univ Rennes, INSERM, LTSI UMR 1099, F-35000 Rennes, France

[2] Univ Tokyo, Dept Mech Engn, Tokyo 1138656, Japan

[3] VisionAI hutom, Seoul, South Korea

[4] Johns Hopkins Univ, Baltimore, MD USA

[5] Zhejiang Univ, Hangzhou, Peoples R China

[6] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China

[7] Netaji Subhas Univ Technol, Delhi, India

[8] Natl Univ Singapore, Singapore, Singapore

[9] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[10] Japan East Hosp, Natl Canc Ctr, Tokyo 1040045, Japan

[11] Muroran Inst Technol, Hokkaido, Japan

来源：

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE | 2023年 / 236卷

关键词：

Surgical process model; Workflow recognition; Multimodal; OR of the future; VIDEOS; TASKS;

D O I：

10.1016/j.cmpb.2023.107561

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Background and objective: In order to be context-aware, computer-assisted surgical systems require accu-rate, real-time automatic surgical workflow recognition. In the past several years, surgical video has been the most commonly-used modality for surgical workflow recognition. But with the democratization of robot-assisted surgery, new modalities, such as kinematics, are now accessible. Some previous methods use these new modalities as input for their models, but their added value has rarely been studied. This paper presents the design and results of the "PEg TRAnsfer Workflow recognition" (PETRAW) challenge with the objective of developing surgical workflow recognition methods based on one or more modalities and studying their added value. Methods: The PETRAW challenge included a data set of 150 peg transfer sequences performed on a vir-tual simulator. This data set included videos, kinematic data, semantic segmentation data, and annota-tions, which described the workflow at three levels of granularity: phase, step, and activity. Five tasks were proposed to the participants: three were related to the recognition at all granularities simultane-ously using a single modality, and two addressed the recognition using multiple modalities. The mean application-dependent balanced accuracy (AD-Accuracy) was used as an evaluation metric to take into account class balance and is more clinically relevant than a frame-by-frame score.Results: Seven teams participated in at least one task with four participating in every task. The best results were obtained by combining video and kinematic data (AD-Accuracy of between 93% and 90% for the four teams that participated in all tasks). Conclusion: The improvement of surgical workflow recognition methods using multiple modalities com-pared with unimodal methods was significant for all teams. However, the longer execution time required for video/kinematic-based methods(compared to only kinematic-based methods) must be considered. In-deed, one must ask if it is wise to increase computing time by 20 0 0 to 20,0 0 0% only to increase accuracy by 3%. The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further re-search in surgical workflow recognition.(c) 2023 Elsevier B.V. All rights reserved.

引用

页数：18

共 50 条

[21] Multimodal Data Fusion Architectures in Audiovisual Speech Recognition
Sayed, Hadeer M.
ElDeeb, Hesham E.
Taiel, Shereen A.
INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 2, WORLDCIST 2023, 2024, 800 : 655 - 667
[22] Person Recognition with HGR Maximal Correlation on Multimodal Data
Liang, Yihua
Ma, Fei
Li, Yang
Huang, Shao-Lun
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2188 - 2195
[23] An Algorithm of Emotion Recognition And Valence of Drivers on Multimodal Data
Guo, Lu
Shen, Yun
Ding, Peng
2022 IEEE INTERNATIONAL SYMPOSIUM ON BROADBAND MULTIMEDIA SYSTEMS AND BROADCASTING (BMSB), 2022,
[24] Aircraft Behavior Recognition on Trajectory Data with a Multimodal Approach
Zhang, Meng
Zhang, Lingxi
Liu, Tao
ELECTRONICS, 2024, 13 (02)
[25] EXPLOITING MULTIMODAL DATA FUSION IN ROBUST SPEECH RECOGNITION
Heracleous, Panikos
Badin, Pierre
Bailly, Gerard
Hagita, Norihiro
2010 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME 2010), 2010, : 568 - 572
[26] Multimodal Biosignal Sensor Data Handling for Emotion Recognition
Canento, Filipe
Fred, Ana
Silva, Hugo
Gamboa, Hugo
Lourenco, Andre
2011 IEEE SENSORS, 2011, : 647 - 650
[27] Activity Recognition and Segmentation Approaches to Multimodal Lifelog Data
Gupta, Rashmi
Gurrin, Cathal
2019 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2019,
[28] Data fusion for driver drowsiness recognition: A multimodal perspective
Priyanka, S.
Shanthi, S.
Kumar, A. Saran
Praveen, V.
EGYPTIAN INFORMATICS JOURNAL, 2024, 27
[29] Emotion Recognition Based on Multimodal Physiological Data: A Survey
Liu, Ying
Yuan, Li
Zu, Shuodi
Fan, Youteng
Xie, Ning
Yang, Yang
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2024, 53 (05): : 720 - 731
[30] System for multimodal data acquisition for human action recognition
Malawski, Filip
Galka, Jakub
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (18) : 23825 - 23850

← 1 2 3 4 5 →