Explainable Multi-Modal Deep Learning With Cross-Modal Attention for Diagnosis of Dyssynergic Defecation Using Abdominal X-Ray Images and Symptom Questionnaire

被引:0
|
作者
Sangnark, Sirapob [1 ]
Rattanachaisit, Pakkapon [2 ,3 ]
Patcharatrakul, Tanisa [3 ,4 ]
Vateekul, Peerapon [1 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Dept Comp Engn, Bangkok 10330, Thailand
[2] Chulalongkorn Univ, Fac Med, Dept Physiol, Bangkok 10330, Thailand
[3] Chulalongkorn Univ, Fac Med, Ctr Excellence Neurogastroenterol & Motil, Bangkok 10330, Thailand
[4] King Chulalongkorn Mem Hosp, Dept Med, Div Gastroenterol, Thai Red Cross Soc, Bangkok 10330, Thailand
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Deep learning; X-ray imaging; Data models; Biomedical imaging; Diseases; Medical diagnostic imaging; Task analysis; Dyssynergic defecation; multi-modal; deep learning; attention mechanism; explainable AI; CLASSIFICATION; NETWORK;
D O I
10.1109/ACCESS.2024.3409077
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Dyssynergic defecation (DD) is a type of functional constipation that requires a specialized test for diagnosis. However, these tests are only accessible in tertiary care because they require devices that are not available elsewhere. In this work, we present explainable multi-modal deep learning models that can pre-screen patients with DD, using affordable data accessible in small hospitals i.e. abdominal X-ray images and symptom questionnaires; the output classifies whether DD is present or not. To enhance the model's performance, we apply cross-modal attention to help the model find meaningful interactions between the two modalities. A convolution block attention module (CBAM) is added to obtain more important semantic and spatial features from the images. Masking augmentation is implemented to ignore irrelevant backgrounds in images. Both explainable AI techniques like gradient-weighted class activation mapping (Grad-CAM) and deep shapley additive explanations (DeepSHAP) are also used to explain the important parts of images and the symptom data for each patient. In our experiments, all models are run on 3 patient-based bootstraps. Our model is compared with single-modal models and human experts. Results demonstrate that our multi-modal model outperforms the single-modal model and achieves the highest in terms of sensitivity, specificity, F1, and accuracy (87.37%, 77.01%, 82.17%, and 82.27%), respectively. In addition, our model outperforms human experts, which shows its ability to assist human experts in diagnosing DD. This model is a novel clinical tool that combines symptom and image data for a more accurate diagnosis of DD.
引用
收藏
页码:78132 / 78147
页数:16
相关论文
共 50 条
  • [11] Cross-modal Non-linear Guided Attention and Temporal Coherence in Multi-modal Deep Video Models
    Sahu, Saurabh
    Goyal, Palash
    Ghosh, Shalini
    Lee, Chul
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 313 - 321
  • [12] Mapping Multi-Modal Brain Connectome for Brain Disorder Diagnosis via Cross-Modal Mutual Learning
    Yang, Yanwu
    Ye, Chenfei
    Guo, Xutao
    Wu, Tao
    Xiang, Yang
    Ma, Ting
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2024, 43 (01) : 108 - 121
  • [13] Cross-modal retrieval of chest X-ray images and diagnostic reports based on report entity graph and dual attention
    Ou, Weihua
    Chen, Yingjie
    Liang, Linqing
    Gou, Jianping
    Xiong, Jiahao
    Zhang, Jiacheng
    Lai, Lingge
    Zhang, Lei
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [14] Deep Multi-lnstance Learning Using Multi-Modal Data for Diagnosis for Lymphocytosis
    Sahasrabudhe, Mihir
    Sujobert, Pierre
    Zacharaki, Evangelia, I
    Maurin, Eugenie
    Grange, Beatrice
    Jallades, Laurent
    Paragios, Nikos
    Vakalopoulou, Maria
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2021, 25 (06) : 2125 - 2136
  • [15] Synthesizing images of tau pathology from cross-modal neuroimaging using deep learning
    Lee, Jeyeon
    Burkett, Brian J.
    Min, Hoon-Ki
    Senjem, Matthew L.
    Dicks, Ellen
    Corriveau-Lecavalier, Nick
    Mester, Carly T.
    Wiste, Heather J.
    Lundt, Emily S.
    Murray, Melissa E.
    Nguyen, Aivi T.
    Reichard, Ross R.
    Botha, Hugo
    Graff-Radford, Jonathan
    Barnard, Leland R.
    Gunter, Jeffrey L.
    Schwarz, Christopher G.
    Kantarci, Kejal
    Knopman, David S.
    Boeve, Bradley F.
    Lowe, Val J.
    Petersen, Ronald C.
    Jack Jr, Clifford R.
    Jones, David T.
    BRAIN, 2024, 147 (03) : 980 - 995
  • [16] Beyond images: an integrative multi-modal approach to chest x-ray report generation
    Aksoy, Nurbanu
    Sharoff, Serge
    Baser, Selcuk
    Ravikumar, Nishant
    Frangi, Alejandro F.
    FRONTIERS IN RADIOLOGY, 2024, 4
  • [17] An attention-enhanced multi-modal deep learning algorithm for robotic compound fault diagnosis
    Zhou, Xing
    Zeng, Hanlin
    Chen, Chong
    Xiao, Hong
    Xiang, Zhenlin
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2023, 34 (01)
  • [18] CMAF-Net: a cross-modal attention fusion-based deep neural network for incomplete multi-modal brain tumor segmentation
    Sun, Kangkang
    Ding, Jiangyi
    Li, Qixuan
    Chen, Wei
    Zhang, Heng
    Sun, Jiawei
    Jiao, Zhuqing
    Ni, Xinye
    QUANTITATIVE IMAGING IN MEDICINE AND SURGERY, 2024, 14 (07) : 4579 - 4604
  • [19] Simulating cross-modal medical images using multi-task adversarial learning of a deep convolutional neural network
    Kumar, Vikas
    Sharma, Manoj
    Jehadeesan, R.
    Venkatraman, B.
    Sheet, Debdoot
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (04)
  • [20] Novelty detection of foreign objects in food using multi-modal X-ray imaging
    Einarsdottir, Hildur
    Emerson, Monica Jane
    Clemmensen, Line Harder
    Scherer, Kai
    Willer, Konstantin
    Bech, Martin
    Larsen, Rasmus
    Ersboll, Bjarne Kjaer
    Pfeiffer, Franz
    FOOD CONTROL, 2016, 67 : 39 - 47