Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

被引:4
|
作者
Huang, Chaoqin [1 ,2 ,3 ]
Han, Aofan [1 ,3 ]
Feng, Jinghao [1 ,3 ]
Zhang, Ya [1 ,3 ]
Wan, Xinchao [2 ]
Wang, Yanfeng [1 ,3 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
基金
国家重点研发计划;
关键词
D O I
10.1109/CVPR52733.2024.01081
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in large-scale visual-language pre-trained models have led to significant progress in zero-/few-shot anomaly detection within natural image domains. However, the substantial domain divergence between natural and medical images limits the effectiveness of these methodologies in medical anomaly detection. This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. This multi-level adaptation is guided by multi-level, pixel-wise visual-language feature alignment loss functions, which recalibrate the model's focus from object semantics in natural imagery to anomaly identification in medical images. The adapted features exhibit improved generalization across various medical data types, even in zero-shot scenarios where the model encounters unseen medical modalities and anatomical regions during training. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models, with an average AUC improvement of 6.24% and 7.33% for anomaly classification, 2.03% and 2.37% for anomaly segmentation, under the zero-shot and few-shot settings, respectively. Source code is available at: https://github.com/MediaBrain-SJTU/MVFA-AD
引用
收藏
页码:11375 / 11385
页数:11
相关论文
共 50 条
  • [1] Visual-language foundation models in medicine
    Liu, Chunyu
    Jin, Yixiao
    Guan, Zhouyu
    Li, Tingyao
    Qin, Yiming
    Qian, Bo
    Jiang, Zehua
    Wu, Yilan
    Wang, Xiangning
    Zheng, Ying Feng
    Zeng, Dian
    VISUAL COMPUTER, 2025, 41 (04): : 2953 - 2972
  • [2] Reconsidering learnable fine-grained text prompts for few-shot anomaly detection in visual-language models
    Han, Delong
    Xu, Luo
    Zhou, Mingle
    Wan, Jin
    Li, Min
    Li, Gang
    NEURAL NETWORKS, 2025, 182
  • [3] VTPL: Visual and text prompt learning for visual-language models
    Sun, Bo
    Wu, Zhichao
    Zhang, Hao
    He, Jun
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [4] Prompting Visual-Language Models for Efficient Video Understanding
    Ju, Chen
    Han, Tengda
    Zheng, Kunhao
    Zhang, Ya
    Xie, Weidi
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 105 - 124
  • [5] Towards Generalizable Network Anomaly Detection Models
    Arifuzzaman, Md
    Islam, Shafkat
    Arslan, Engin
    PROCEEDINGS OF THE IEEE 46TH CONFERENCE ON LOCAL COMPUTER NETWORKS (LCN 2021), 2021, : 375 - 378
  • [6] Most and Least Retrievable Images in Visual-Language Query Systems
    Zhu, Liuwan
    Ning, Rui
    Li, Jiang
    Xin, Chunsheng
    Wu, Hongyi
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 1 - 18
  • [7] Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
    Li, Xin
    Wu, Yunfei
    Jiang, Xinghua
    Guo, Zhihao
    Gong, Mingming
    Cao, Haoyu
    Liu, Yinsong
    Jiang, Deqiang
    Sun, Xing
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 15546 - 15555
  • [8] High Efficiency Image Compression for Large Visual-Language Models
    Li, Binzhe
    Wang, Shurun
    Wang, Shiqi
    Ye, Yan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2870 - 2880
  • [9] The role of noise in denoising models for anomaly detection in medical images
    Kascenas, Antanas
    Sanchez, Pedro
    Schrempf, Patrick
    Wang, Chaoyang
    Clackett, William
    Mikhael, Shadia S.
    Voisey, Jeremy P.
    Goatman, Keith
    Weir, Alexander
    Pugeault, Nicolas
    Tsaftaris, Sotirios A.
    O'Neil, Alison Q.
    MEDICAL IMAGE ANALYSIS, 2023, 90
  • [10] Zero-Shot Nuclei Detection via Visual-Language Pre-trained Models
    Wu, Yongjian
    Zhou, Yang
    Saiyin, Jiya
    Wei, Bingzheng
    lai, Maode
    Shou, Jianzhong
    Fan, Yubo
    Xu, Yan
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT VI, 2023, 14225 : 693 - 703