Enhancing glaucoma detection through multi-modal integration of retinal images and clinical biomarkers

被引:0
|
作者
Sivakumar, Rishikesh [1 ]
Penkova, Anita [2 ]
机构
[1] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90089 USA
[2] Univ Southern Calif, Dept Aerosp & Mech Engn, Los Angeles, CA 90089 USA
关键词
Glaucoma detection; Vision Transformers; Convolutional Neural Networks; Machine Learning; Clinical Biomarkers; AUTOMATED EXTRACTION; FRAMEWORK;
D O I
10.1016/j.engappai.2025.110010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Glaucoma, a major cause of irreversible blindness globally, often progresses without early symptoms, making prompt and precise detection vital. This paper introduces a multi-modal glaucoma detection system that combines advanced deep learning architectures to analyze retinal images and clinical biomarkers. We developed three hybrid models: the first blends Vision Transformers (ViT) with Convolutional Neural Networks (CNN), specifically Residual Networks (ResNet), for comprehensive feature extraction; the second uses ObjectWindow-Location Vision Transformer (OWL-ViT) with Residual Networks for enhanced global contextual insights; and the third employs a Hierarchical Vision Transformer using Shifted Windows (Swin Transformer) with Residual Networks, which demonstrated the best performance. The strengths of these models, broad contextual capture by ViT, localized detail extraction by CNNs, and refined granularity by Swin Transformer, thereby improving both feature representation and computational efficiency, make them well-suited for clinical use. The best-optimized system, featuring the Swin Transformer hybrid model, achieved an F1-score of 0.993 for glaucoma and 0.995 for non-glaucoma, with an overall accuracy of 99.4% on a dataset of 2874 new cases, correctly classifying 2857 of them, thus confirming its efficacy in enhancing early-stage glaucoma detection and significantly advancing over existing methods.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Enhancing Image Classification Models with Multi-modal Biomarkers
    Caban, Jesus J.
    Liao, David
    Yao, Jianhua
    Mollura, Daniel J.
    Gochuico, Bernadette
    Yoo, Terry
    MEDICAL IMAGING 2011: COMPUTER-AIDED DIAGNOSIS, 2011, 7963
  • [2] Enhancing heart failure diagnosis through multi-modal data integration and deep learning
    Liu, Yi
    Li, Dengao
    Zhao, Jumin
    Liang, Yuchen
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (18) : 55259 - 55281
  • [3] Enhancing heart failure diagnosis through multi-modal data integration and deep learning
    Yi Liu
    Dengao Li
    Jumin Zhao
    Yuchen Liang
    Multimedia Tools and Applications, 2024, 83 : 55259 - 55281
  • [4] Multi-modal automatic montaging of adaptive optics retinal images
    Chen, Min
    Cooper, Robert F.
    Han, Grace K.
    Gee, James
    Brainard, David H.
    Morgan, Jessica I. W.
    BIOMEDICAL OPTICS EXPRESS, 2016, 7 (12): : 4899 - 4918
  • [5] Multi-modal Automatic Montaging of Adaptive Optics Retinal Images
    Chen, Min
    Cooper, Robert F.
    Han, Grace K.
    Gee, Lames
    Brainard, David H.
    Morgan, Jessica Ijams Wolfing
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2016, 57 (12)
  • [6] Enhancing hyperspectral imaging through macro and multi-modal capabilities
    Ardini, Benedetto
    Corti, Matteo
    Ghirardello, Marta
    Di Benedetto, Alessia
    Berti, Letizia
    Catto, Cristina
    Goidanich, Sara
    Sciutto, Giorgia
    Prati, Silvia
    Valentini, Gianluca
    Manzoni, Cristian
    Comelli, Daniela
    Candeo, Alessia
    JOURNAL OF PHYSICS-PHOTONICS, 2024, 6 (03):
  • [7] Enhancing Parallelization with OpenMP through Multi-Modal Transformer Learning
    Chen, Yuehua
    Yuan, Huaqiang
    Hou, Fengyao
    Hu, Peng
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 465 - 469
  • [8] Semantic Segmentation of Defects in Infrastructures through Multi-modal Images
    Shahsavarani, Sara
    Lopez, Fernando
    Ibarra-Castanedo, Clemente
    Maldague, Xavier P., V
    THERMOSENSE: THERMAL INFRARED APPLICATIONS XLVI, 2024, 13047
  • [9] Enhancing Few-Shot Multi-modal Fake News Detection Through Adaptive Fusion
    Ouyang, Qiang
    Lin, Nankai
    Zhou, Yongmei
    Yang, Aimin
    Zhou, Dong
    WEB AND BIG DATA, APWEB-WAIM 2024, PT IV, 2024, 14964 : 432 - 447
  • [10] Multi-modal Fusion Network for Rumor Detection with Texts and Images
    Li, Boqun
    Qian, Zhong
    Li, Peifeng
    Zhu, Qiaoming
    MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 15 - 27