PEg TRAnsfer Workflow recognition challenge report: Do multimodal data improve recognition?

被引:0
|
作者
Huaulme, Arnaud [1 ]
Harada, Kanako [2 ]
Nguyen, Quang-Minh [1 ]
Park, Bogyu [3 ]
Hong, Seungbum [3 ]
Choi, Min -Kook [3 ]
Peven, Michael [4 ]
Li, Yunshuang [5 ]
Long, Yonghao [6 ]
Dou, Qi [6 ]
Kumar, Satyadwyoom [7 ]
Lalithkumar, Seenivasan [8 ]
Hongliang, Ren [8 ,9 ]
Matsuzaki, Hiroki [10 ]
Ishikawa, Yuto [10 ]
Harai, Yuriko [10 ]
Kondo, Satoshi [11 ]
Mitsuishi, Manoru [2 ]
Jannin, Pierre [1 ]
机构
[1] Univ Rennes, INSERM, LTSI UMR 1099, F-35000 Rennes, France
[2] Univ Tokyo, Dept Mech Engn, Tokyo 1138656, Japan
[3] VisionAI hutom, Seoul, South Korea
[4] Johns Hopkins Univ, Baltimore, MD USA
[5] Zhejiang Univ, Hangzhou, Peoples R China
[6] Chinese Univ Hong Kong, Dept Comp Sci & Engn, Hong Kong, Peoples R China
[7] Netaji Subhas Univ Technol, Delhi, India
[8] Natl Univ Singapore, Singapore, Singapore
[9] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[10] Japan East Hosp, Natl Canc Ctr, Tokyo 1040045, Japan
[11] Muroran Inst Technol, Hokkaido, Japan
关键词
Surgical process model; Workflow recognition; Multimodal; OR of the future; VIDEOS; TASKS;
D O I
10.1016/j.cmpb.2023.107561
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and objective: In order to be context-aware, computer-assisted surgical systems require accu-rate, real-time automatic surgical workflow recognition. In the past several years, surgical video has been the most commonly-used modality for surgical workflow recognition. But with the democratization of robot-assisted surgery, new modalities, such as kinematics, are now accessible. Some previous methods use these new modalities as input for their models, but their added value has rarely been studied. This paper presents the design and results of the "PEg TRAnsfer Workflow recognition" (PETRAW) challenge with the objective of developing surgical workflow recognition methods based on one or more modalities and studying their added value. Methods: The PETRAW challenge included a data set of 150 peg transfer sequences performed on a vir-tual simulator. This data set included videos, kinematic data, semantic segmentation data, and annota-tions, which described the workflow at three levels of granularity: phase, step, and activity. Five tasks were proposed to the participants: three were related to the recognition at all granularities simultane-ously using a single modality, and two addressed the recognition using multiple modalities. The mean application-dependent balanced accuracy (AD-Accuracy) was used as an evaluation metric to take into account class balance and is more clinically relevant than a frame-by-frame score.Results: Seven teams participated in at least one task with four participating in every task. The best results were obtained by combining video and kinematic data (AD-Accuracy of between 93% and 90% for the four teams that participated in all tasks). Conclusion: The improvement of surgical workflow recognition methods using multiple modalities com-pared with unimodal methods was significant for all teams. However, the longer execution time required for video/kinematic-based methods(compared to only kinematic-based methods) must be considered. In-deed, one must ask if it is wise to increase computing time by 20 0 0 to 20,0 0 0% only to increase accuracy by 3%. The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further re-search in surgical workflow recognition.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Summary of the Third Nurse Care Activity Recognition Challenge - Can We Do from the Field Data?
    Alia, Sayeda Shamma
    Adachi, Kohei
    Hossain, Tahera
    Nhat Tan Le
    Kaneko, Haru
    Lago, Paula
    Okita, Tsuyoshi
    Inoue, Sozo
    UBICOMP/ISWC '21 ADJUNCT: PROCEEDINGS OF THE 2021 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2021 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS, 2021, : 428 - 433
  • [42] Soft Tactile Sensor With Multimodal Data Processing for Texture Recognition
    Martinez-Hernandez, Uriel
    Assaf, Tareq
    IEEE SENSORS LETTERS, 2023, 7 (08)
  • [43] Improving Sign Language Recognition Performance Using Multimodal Data
    Nishimura, Tomoe
    Abbasi, Bahareh
    2024 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI 2024, 2024, : 184 - 189
  • [44] Multimodal Analysis of Unbalanced Dermatological Data for Skin Cancer Recognition
    Lyakhov, Pavel A.
    Lyakhova, Ulyana A.
    Kalita, Diana I.
    IEEE ACCESS, 2023, 11 : 131487 - 131507
  • [45] Review on Human Action Recognition Methods Based on Multimodal Data
    Wang, Cailing
    Yan, Jingjing
    Zhang, Zhidong
    Computer Engineering and Applications, 60 (09): : 1 - 18
  • [46] Exploring Fusion Methods for Multimodal Emotion Recognition with Missing Data
    Wagner, Johannes
    Lingenfelser, Florian
    Andre, Elisabeth
    Kim, Jonghwa
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2011, 2 (04) : 206 - 218
  • [47] Multimodal Emotion Recognition using EEG and Eye Tracking Data
    Zheng, Wei-Long
    Dong, Bo-Nan
    Lu, Bao-Liang
    2014 36TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY (EMBC), 2014, : 5040 - 5043
  • [48] Canonical Correlation Analysis for Data Fusion in Multimodal Emotion Recognition
    Nemati, Shahla
    2018 9TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2018, : 676 - 681
  • [49] Online Learning for Multimodal Data Fusion With Application to Object Recognition
    Shahrampour, Shahin
    Noshad, Mohammad
    Ding, Jie
    Tarokh, Vahid
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (09) : 1259 - 1263
  • [50] Improving Multimodal Speech Recognition by Data Augmentation and Speech Representations
    Oneata, Dan
    Cucu, Horia
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4578 - 4587