Dynamic Scene Graph Representation for Surgical Video

被引:4
|
作者
Holm, Felix [1 ]
Ghazaei, Ghazal [2 ]
Czempiel, Tobias [1 ]
Oezsoy, Ege [1 ]
Saur, Stefan [3 ]
Navab, Nassir [1 ]
机构
[1] Tech Univ Munich, Chair Comp Aided Med Procedures, Munich, Germany
[2] Carl Zeiss, Oberkochen, Germany
[3] Carl Zeiss Meditec AG, Jena, Germany
关键词
D O I
10.1109/ICCVW60793.2023.00015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Surgical videos captured from microscopic or endoscopic imaging devices are rich but complex sources of information, depicting different tools and anatomical structures utilized during an extended amount of time. Despite containing crucial workflow information and being commonly recorded in many procedures, usage of surgical videos for automated surgical workflow understanding is still limited. In this work, we exploit scene graphs as a more holistic, semantically meaningful and human-readable way to represent surgical videos while encoding all anatomical structures, tools, and their interactions. To properly evaluate the impact of our solutions, we create a scene graph dataset from semantic segmentations from the CaDIS and CATARACTS datasets. We demonstrate that scene graphs can be leveraged through the use of graph convolutional networks (GCNs) to tackle surgical downstream tasks such as surgical workflow recognition with competitive performance. Moreover, we demonstrate the benefits of surgical scene graphs regarding the explainability and robustness of model decisions, which are crucial in the clinical setting.
引用
收藏
页码:81 / 87
页数:7
相关论文
共 50 条
  • [1] Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering
    Mao, Jianguo
    Jiang, Wenbin
    Wang, Xiangdong
    Feng, Zhifan
    Lyu, Yajuan
    Liu, Hong
    Zhu, Yong
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3894 - 3904
  • [2] Dynamic Scene Graph Generation of Point Clouds with Structural Representation Learning
    Qi, Chao
    Yin, Jianqin
    Zhang, Zhicheng
    Tang, Jin
    TSINGHUA SCIENCE AND TECHNOLOGY, 2024, 29 (01): : 232 - 243
  • [3] Scene Consistency Representation Learning for Video Scene Segmentation
    Wu, Haoqian
    Chen, Keyu
    Luo, Yanan
    Qiao, Ruizhi
    Ren, Bo
    Liu, Haozhe
    Xie, Weicheng
    Shen, Linlin
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14001 - 14010
  • [4] Video scene clustering by graph partitioning
    Tan, YP
    Lu, H
    ELECTRONICS LETTERS, 2003, 39 (11) : 841 - 842
  • [5] Panoptic Video Scene Graph Generation
    Yang, Jingkang
    Peng, Wenxuan
    Li, Xiangtai
    Guo, Zujin
    Chen, Liangyu
    Li, Bo
    Ma, Zheng
    Zhou, Kaiyang
    Zhang, Wayne
    Loy, Chen Change
    Liu, Ziwei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18675 - 18685
  • [6] Using Spatial Temporal Graph Convolutional Network Dynamic Scene Graph for Video Captioning of Pedestrians Intention
    Cao, Dong
    Zhao, Qunhe
    Fu, Yunbin
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 179 - 183
  • [7] A hierarchical spatio-temporal object knowledge graph model for dynamic scene representation
    Zhao, Xinke
    Cao, Yibing
    Wang, Jiahe
    Fan, Xinhua
    Chen, Minjie
    TRANSACTIONS IN GIS, 2023, 27 (07) : 1992 - 2016
  • [8] Video summarization and scene detection by graph modeling
    Ngo, CW
    Ma, YF
    Zhang, HJ
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2005, 15 (02) : 296 - 305
  • [9] Scene Video Text Tracking With Graph Matching
    Pei, Wei-Yi
    Yang, Chun
    Meng, Li-Yu
    Hou, Jie-Bo
    Tian, Shu
    Yin, Xu-Cheng
    IEEE ACCESS, 2018, 6 : 19419 - 19426
  • [10] Incorporating the Graph Representation of Video and Text into Video Captioning
    Lu, Min
    Li, Yuan
    2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 396 - 401