Rethinking masked image modelling for medical image representation

被引:1
|
作者
Xie, Yutong [1 ]
Gu, Lin [2 ,3 ]
Harada, Tatsuya [2 ,3 ]
Zhang, Jianpeng [4 ]
Xia, Yong [5 ,6 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Adelaide, Australia
[2] RIKEN AIP, Tokyo, Japan
[3] Univ Tokyo, RCAST, Tokyo, Japan
[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China
[5] Northwestern Polytech Univ, Sch Comp Sci & Engn, Xian 710072, Peoples R China
[6] Northwestern Polytech Univ, Ningbo Inst, Ningbo 315048, Peoples R China
关键词
Medical image representations; Masked image modelling; Visual-language pre-training; TRANSFORMER;
D O I
10.1016/j.media.2024.103304
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Masked Image Modelling (MIM), a form of self-supervised learning, has garnered significant success in computer vision by improving image representations using unannotated data. Traditional MIMs typically employ a strategy of random sampling across the image. However, this random masking technique may not be ideally suited for medical imaging, which possesses distinct characteristics divergent from natural images. In medical imaging, particularly in pathology, disease-related features are often exceedingly sparse and localized, while the remaining regions appear normal and undifferentiated. Additionally, medical images frequently accompany reports, directly pinpointing pathological changes' location. Inspired by this, we propose M asked medical ed ical I mage M odelling (MedIM), a novel approach, to our knowledge, the first research that employs radiological reports to guide the masking and restore the informative areas of images, encouraging the network to explore the stronger semantic representations from medical images. We introduce two mutual comprehensive masking strategies, knowledge-driven masking (KDM), and sentence-driven masking (SDM). KDM uses Medical Subject Headings (MeSH) words unique to radiology reports to identify symptom clues mapped to MeSH words ( e.g. , cardiac, edema, vascular, pulmonary) and guide the mask generation. Recognizing that radiological reports often comprise several sentences detailing varied findings, SDM integrates sentence-level information to identify key regions for masking. MedIM reconstructs images informed by this masking from the KDM and SDM modules, promoting a comprehensive and enriched medical image representation. Our extensive experiments on seven downstream tasks covering multi-label/class image classification, pneumothorax segmentation, and medical image-report analysis, demonstrate that MedIM with report-guided masking achieves competitive performance. Our method substantially outperforms ImageNet pre-training, MIM-based pre-training, and medical image-report pre-training counterparts. Codes are available at https://github.com/YtongXie/MedIM.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Contrastive Masked Image-Text Modeling for Medical Visual Representation Learning
    Chen, Cheng
    Zhong, Aoxiao
    Wu, Dufan
    Luo, Jie
    Li, Quanzheng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT V, 2023, 14224 : 493 - 503
  • [2] Masked Image Modelling for Retinal OCT Understanding
    Pissas, Theodoros
    Marquez-Neila, Pablo
    Wolfe, Sebastian
    Zinkernagel, Martin
    Sznitman, Raphael
    OPHTHALMIC MEDICAL IMAGE ANALYSIS, OMIA 2024, 2025, 15188 : 115 - 125
  • [3] RETHINKING REPRESENTATION, THE 'POWER OF THE IMAGE' + BY KUHN,ANNETTE
    SWANSON, G
    SCREEN, 1986, 27 (05) : 16 - 28
  • [4] Generalizable stereo depth estimation with masked image modelling
    Tukra, Samyakh
    Xu, Haozheng
    Xu, Chi
    Giannarou, Stamatia
    HEALTHCARE TECHNOLOGY LETTERS, 2024, 11 (2-3) : 108 - 116
  • [5] Masked Image Modeling Advances 3D Medical Image Analysis
    Chen, Zekai
    Agarwal, Devansh
    Aggarwal, Kshitij
    Safta, Wiem
    Balan, Mariann Micsinai
    Brown, Kevin
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1969 - 1979
  • [6] A Mapping Modelling of Visual Feature and Knowledge Representation Approach for Medical Image Retrieval
    Li Jin
    Liang Hong
    Tang Lianzhi
    2009 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION, VOLS 1-7, CONFERENCE PROCEEDINGS, 2009, : 1778 - +
  • [7] Knowledge representation for image content analysis in medical image database
    Luo, H
    Gaborski, R
    Acharya, R
    MEDICAL IMAGING: 2001: IMAGE PROCESSING, PTS 1-3, 2001, 4322 : 1035 - 1045
  • [8] Rethinking Feature Guidance for Medical Image Segmentation
    Wang, Wei
    He, Jixing
    Wang, Xin
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 641 - 645
  • [9] Rethinking Dice Loss for Medical Image Segmentation
    Zhao, Rongjian
    Qian, Buyue
    Zhang, Xianli
    Li, Yang
    Wei, Rong
    Liu, Yang
    Pan, Yinggang
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 851 - 860
  • [10] Multi-View Masked Autoencoder for General Image Representation
    Ji, Seungbin
    Han, Sangkwon
    Rhee, Jongtae
    APPLIED SCIENCES-BASEL, 2023, 13 (22):