Rethinking masked image modelling for medical image representation

被引:1
|
作者
Xie, Yutong [1 ]
Gu, Lin [2 ,3 ]
Harada, Tatsuya [2 ,3 ]
Zhang, Jianpeng [4 ]
Xia, Yong [5 ,6 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Adelaide, Australia
[2] RIKEN AIP, Tokyo, Japan
[3] Univ Tokyo, RCAST, Tokyo, Japan
[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[5] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[6] Northwestern Polytech Univ, Ningbo Inst, Ningbo 315048, Peoples R China
关键词
Medical image representations; Masked image modelling; Visual-language pre-training; TRANSFORMER;
D O I
10.1016/j.media.2024.103304
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose M asked medical ed ical I mage M odelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words ( e.g. , cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] IMAGE RESTORATION BY MIXTURE MODELLING OF AN OVERCOMPLETE LINEAR REPRESENTATION
    Mancera, L.
    Babacan, S. Derin
    Molina, R.
    Katsaggelos, A. K.
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 3949 - +
  • [22] Medical image compression using attention modelling
    Maeder, AJ
    MEDICAL IMAGING 1999: IMAGE PERCEPTION AND PERFORMANCE, 1999, 3663 : 129 - 135
  • [23] DRCL: rethinking jigsaw puzzles for unsupervised medical image segmentation
    Ni, Jian
    Wang, Zheng
    Wang, Yixiao
    Tao, Wenjian
    Shen, Ao
    VISUAL COMPUTER, 2024,
  • [24] Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis
    Vu Minh Hieu Phan
    Xie, Yutong
    Zhang, Bowen
    Qi, Yuankai
    Liao, Zhibin
    Perperidis, Antonios
    Phung, Son Lam
    Verjans, Johan W.
    To, Minh-Son
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT VII, 2024, 15007 : 690 - 700
  • [25] Rethinking Disentanglement in Unsupervised Domain Adaptation for Medical Image Segmentation
    Wang, Yan
    Chen, Yixin
    Zhang, Yingying
    Zhu, Haogang
    2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,
  • [26] Rethinking the Necessity of Learnable Modal Alignment for Medical Image Fusion
    Li, Min
    Li, Feng
    Zuo, Enguang
    Lv, Xiaoyi
    Chen, Chen
    Chen, Cheng
    PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024, 2025, 15035 : 596 - 610
  • [27] Deblurring masked image modeling for ultrasound image analysis
    Kang, Qingbo
    Lao, Qicheng
    Gao, Jun
    Liu, Jingyan
    Yi, Huahui
    Ma, Buyun
    Zhang, Xiaofan
    Li, Kang
    MEDICAL IMAGE ANALYSIS, 2024, 97
  • [28] Masked Image Training for Generalizable Deep Image Denoising
    Chen, Haoyu
    Gu, Jinjin
    Liu, Yihao
    Magid, Salma Abdel
    Dong, Chao
    Wang, Qiong
    Pfister, Hanspeter
    Zhu, Lei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1692 - 1703
  • [29] Medical Image Binarization Using Square Wave Representation
    Somasundaram, K.
    Kalavathi, P.
    CONTROL, COMPUTATION AND INFORMATION SYSTEMS, 2011, 140 : 152 - 158
  • [30] Medical image classification via multiscale representation learning
    Tang, Qiling
    Liu, Yangyang
    Liu, Haihua
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2017, 79 : 71 - 78