Survival Prediction via Hierarchical Multimodal Co-Attention Transformer: A Computational Histology-Radiology Solution

被引:11
|
作者
Li, Zhe [1 ]
Jiang, Yuming [2 ]
Lu, Mengkang [1 ]
Li, Ruijiang [2 ]
Xia, Yong [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci & Engn, Natl Engn Lab Integrated AeroSp Ground Ocean Big, Data Applicat Technol, Xian 710072, Peoples R China
[2] Stanford Univ, Dept Radiat Oncol, Sch Med, Stanford, CA 94304 USA
基金
中国国家自然科学基金;
关键词
Radiology; Histopathology; Feature extraction; Cancer; Transformers; Predictive models; Tumors; Multimodal learning; radiology imaging; histology imaging; vision transformer; survival analysis;
D O I
10.1109/TMI.2023.3263010
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The rapid advances in deep learning-based computational pathology and radiology have demonstrated the promise of using whole slide images (WSIs) and radiology images for survival prediction in cancer patients. However, most image-based survival prediction methods are limited to using either histology or radiology alone, leaving integrated approaches across histology and radiology relatively underdeveloped. There are two main challenges in integrating WSIs and radiology images: (1) the gigapixel nature of WSIs and (2) the vast difference in spatial scales between WSIs and radiology images. To address these challenges, in this work, we propose an interpretable, weakly-supervised, multimodal learning framework, called Hierarchical Multimodal Co-Attention Transformer (HMCAT), to integrate WSIs and radiology images for survival prediction. Our approach first uses hierarchical feature extractors to capture various information including cellular features, cellular organization, and tissue phenotypes in WSIs. Then the hierarchical radiology-guided co- attention (HRCA) in HMCAT characterizes the multimodal interactions between hierarchical histology-based visual concepts and radiology features and learns hierarchical co- attention mappings for two modalities. Finally, HMCAT combines their complementary information into a multimodal risk score and discovers prognostic features from two modalities by multimodal interpretability. We apply our approach to two cancer datasets (365 WSIs with matched magnetic resonance [MR] images and 213 WSIs with matched computed tomography [CT] images). Our results demonstrate that the proposed HMCAT consistently achieves superior performance over the unimodal approaches trained on either histology or radiology data alone, as well as other state-of-the-art methods.
引用
收藏
页码:2678 / 2689
页数:12
相关论文
共 10 条
  • [1] Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images
    Chen, Richard J.
    Lu, Ming Y.
    Weng, Wei-Hung
    Chen, Tiffany Y.
    Williamson, Drew F. K.
    Manz, Trevor
    Shady, Maha
    Mahmood, Faisal
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3995 - 4005
  • [2] Multimodal Optimal Transport-based Co-Attention Transformer with Global Structure Consistency for Survival Prediction
    Xu, Yingxue
    Chen, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21184 - 21194
  • [3] TRAFMEL: Multimodal Entity Linking Based on Transformer Reranking and Multimodal Co-Attention Fusion
    Zhang, Xiaoming
    Meng, Kaikai
    Wang, Huiyong
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2024, 34 (06) : 973 - 997
  • [4] TopicCAT: Unsupervised Topic-Guided Co-Attention Transformer for Extreme Multimodal Summarisation
    Tang, Peggy
    Hu, Kun
    Zhang, Lei
    Gao, Junbin
    Luo, Jiebo
    Wang, Zhiyong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6643 - 6652
  • [5] IMPROVING FACIAL ATTRACTIVENESS PREDICTION VIA CO-ATTENTION LEARNING
    Shi, Shengjie
    Gao, Fei
    Meng, Xuantong
    Xu, Xingxin
    Zhu, Jingjie
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 4045 - 4049
  • [6] EFFECTIVE IMAGE TAMPERING LOCALIZATION VIA ENHANCED TRANSFORMER AND CO-ATTENTION FUSION
    Guo, Kun
    Zhu, Haochen
    Cao, Gang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 4895 - 4899
  • [7] Drug target interaction prediction via multi-task co-attention
    Weng, Yuyou
    Liu, Xinyi
    Li, Hui
    Lin, Chen
    Liang, Yun
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2020, 24 (02) : 160 - 176
  • [8] Multimodal Human-Exoskeleton Interface for Lower Limb Movement Prediction Through a Dense Co-Attention Symmetric Mechanism
    Shi, Kecheng
    Mu, Fengjun
    Huang, Rui
    Huang, Ke
    Peng, Zhinan
    Zou, Chaobin
    Yang, Xiao
    Cheng, Hong
    FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [9] Multimodal parallel bilinear co-attention fusion with full-dimensional dynamic wavelet-kernel structure for machinery RUL prediction
    Wang, Yuan
    Lei, Yaguo
    Li, Naipeng
    Li, Xiang
    Yang, Bin
    Gao, Xuanyu
    Liu, Xiaofei
    MEASUREMENT, 2025, 247
  • [10] Prediction of protein-ATP binding residues using multi-view feature learning via contextual-based co-attention network
    Wu, Jia-Shun
    Liu, Yan
    Ge, Fang
    Yu, Dong-Jun
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 172