In-context learning enables multimodal large language models to classify cancer pathology images

被引:0
|
作者
Dyke Ferber [1 ]
Georg Wölflein [2 ]
Isabella C. Wiest [3 ]
Marta Ligero [4 ]
Srividhya Sainath [3 ]
Narmin Ghaffari Laleh [5 ]
Omar S. M. El Nahhas [3 ]
Gustav Müller-Franzes [3 ]
Dirk Jäger [3 ]
Daniel Truhn [3 ]
Jakob Nikolas Kather [6 ]
机构
[1] Heidelberg University Hospital,National Center for Tumor Diseases (NCT)
[2] Heidelberg University Hospital,Department of Medical Oncology
[3] Technical University Dresden,Else Kroener Fresenius Center for Digital Health
[4] University of St Andrews,School of Computer Science
[5] Heidelberg University,Department of Medicine II, Medical Faculty Mannheim
[6] University Hospital Aachen,Department of Diagnostic and Interventional Radiology
[7] University Hospital Dresden,Department of Medicine I
关键词
D O I
10.1038/s41467-024-51465-9
中图分类号
学科分类号
摘要
Medical image classification requires labeled, task-specific datasets which are used to train deep learning networks de novo, or to fine-tune foundation models. However, this process is computationally and technically demanding. In language processing, in-context learning provides an alternative, where models learn from within prompts, bypassing the need for parameter updates. Yet, in-context learning remains underexplored in medical image analysis. Here, we systematically evaluate the model Generative Pretrained Transformer 4 with Vision capabilities (GPT-4V) on cancer image processing with in-context learning on three cancer histopathology tasks of high importance: Classification of tissue subtypes in colorectal cancer, colon polyp subtyping and breast tumor detection in lymph node sections. Our results show that in-context learning is sufficient to match or even outperform specialized neural networks trained for particular tasks, while only requiring a minimal number of samples. In summary, this study demonstrates that large vision language models trained on non-domain specific data can be applied out-of-the box to solve medical image-processing tasks in histopathology. This democratizes access of generalist AI models to medical experts without technical background especially for areas where annotated data is scarce.
引用
收藏
相关论文
共 50 条
  • [31] A survey on multimodal large language models
    Shukang Yin
    Chaoyou Fu
    Sirui Zhao
    Ke Li
    Xing Sun
    Tong Xu
    Enhong Chen
    National Science Review, 2024, 11 (12) : 277 - 296
  • [32] Large-scale Lifelong Learning of In-context Instructions and How to Tackle It
    Mok, Jisoo
    Do, Jaeyoung
    Lee, Sungjin
    Taghavi, Tara
    Yu, Seunghak
    Yoon, Sungroh
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 12573 - 12589
  • [33] Applications of Large Language Models in Pathology
    Cheng, Jerome
    BIOENGINEERING-BASEL, 2024, 11 (04):
  • [34] Understanding Naturalistic Facial Expressions with Deep Learning and Multimodal Large Language Models
    Bian, Yifan
    Kuester, Dennis
    Liu, Hui
    Krumhuber, Eva G.
    SENSORS, 2024, 24 (01)
  • [35] From Large Language Models to Large Multimodal Models: A Literature Review
    Huang, Dawei
    Yan, Chuan
    Li, Qing
    Peng, Xiaojiang
    APPLIED SCIENCES-BASEL, 2024, 14 (12):
  • [36] Data augmentation and transfer learning to classify malware images in a deep learning context
    Niccolò Marastoni
    Roberto Giacobazzi
    Mila Dalla Preda
    Journal of Computer Virology and Hacking Techniques, 2021, 17 : 279 - 297
  • [37] SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
    Chen, Yi-Syuan
    Song, Yun-Zhu
    Yeo, Cheng Yu
    Liu, Bei
    Fu, Jianlong
    Shuai, Hong-Han
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15384 - 15396
  • [38] Data augmentation and transfer learning to classify malware images in a deep learning context
    Marastoni, Niccolo
    Giacobazzi, Roberto
    Dalla Preda, Mila
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2021, 17 (04) : 279 - 297
  • [39] Evaluation of Code Generation for Simulating Participant Behavior in Experience Sampling Method by Iterative In-Context Learning of a Large Language Model
    Khanshan A.
    Van Gorp P.
    Markopoulos P.
    Proceedings of the ACM on Human-Computer Interaction, 2024, 8 (EICS)
  • [40] Applying YOLOv6 as an ensemble federated learning framework to classify breast cancer pathology images
    Chhaya Gupta
    Nasib Singh Gill
    Preeti Gulia
    Noha Alduaiji
    J. Shreyas
    Piyush Kumar Shukla
    Scientific Reports, 15 (1)