Reconsidering learnable fine-grained text prompts for few-shot anomaly detection in visual-language models

被引：0

作者：

Han, Delong ^{[1
,2
]}

Xu, Luo ^{[1
,2
]}

Zhou, Mingle ^{[1
,2
]}

Wan, Jin ^{[1
,2
]}

Li, Min ^{[1
,2
]}

Li, Gang ^{[1
,2
]}

机构：

[1] Qilu Univ Technol, Key Lab Comp Power Network & Informat Secur, Minist Educ,Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Acad Sci, Jinan 250353, Shandong, Peoples R China

[2] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Power Internet & Serv C, Jinan 250014, Peoples R China

来源：

NEURAL NETWORKS | 2025年 / 182卷

关键词：

Industrial anomaly detection; Fine-grained text prompts; Few-Shot Anomaly Detection; Pre-trained visual-language models;

D O I：

10.1016/j.neunet.2024.106906

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Few-Shot Anomaly Detection (FSAD) in industrial images aims to identify abnormalities using only a few normal images, which is crucial for industrial scenarios where sample training is limited. The recent advances in large-scale pre-trained visual-language models have brought significant improvements to the FSAD, which typically requires hundreds of text prompts to be manually crafted through prompt engineering. However, manually designed text prompts cannot accurately match the informative features of different categories across diverse images, and the domain gap between train and test datasets can severely impact the generalization capability of text prompts. To address these issues, we propose a visual-language model based on fine-grained learnable text prompts as a unified general framework for FSAD in industry. Firstly, we design a Fine-grained Text Prompts Adapter (FTPA) and an associated registration loss to enhance the efficiency of text prompts. The manually designed text prompts are improved and optimized by capturing normal and abnormal semantic information in the image, so that the text prompts can describe the image semantic information at a finer granularity. In addition, we introduce a Dynamic Modulation Mechanism (DMM) to avoid potential errors in text prompts post-training due to the agnostic during cross-dataset detection. This is achieved by explicitly modulating the branch guided by few-shot images and the branch guided by fine-grained text prompts. Extensive experiments demonstrate that our proposed method achieves state-of-the-art few-shot industrial anomaly detection and segmentation performance. In the 4-shot, the AUROC of the anomaly classification and anomaly segmentation achieves 98.3%, 96.3%, and 93.8%, 97.9% on the MVTec-AD and VisA datasets, respectively.

引用

页数：12

共 50 条

[1] Fine-Grained Prototypes Distillation for Few-Shot Object Detection
Wang, Zichen
Yang, Bo
Yue, Haonan
Ma, Zhenghao
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5859 - 5866
[2] Few-shot Visual Learning with Contextual Memory and Fine-grained Calibration
Ma, Yuqing
Liu, Wei
Bai, Shihao
Zhang, Qingyu
Liu, Aishan
Chen, Weimin
Liu, Xianglong
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 811 - 817
[3] A few-shot fine-grained image recognition method
Wang, Jianwei
Chen, Deyun
BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2023, 71 (01)
[4] Feature alignment via mutual mapping for few-shot fine-grained visual classification
Wu, Qin
Song, Tingting
Fan, Shengnan
Chen, Zeda
Jin, Kelei
Zhou, Haojie
IMAGE AND VISION COMPUTING, 2024, 147
[5] Robust Saliency-Aware Distillation for Few-Shot Fine-Grained Visual Recognition
Liu, Haiqi
Chen, C. L. Philip
Gong, Xinrong
Zhang, Tong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 7529 - 7542
[6] Variational Feature Imitation Conditioned on Visual Descriptions for Few-Shot Fine-Grained Recognition
Lu, Xin
Pan, Yixuan
Cao, Yichao
Zhou, Xin
Lu, Xiaobo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2215 - 2229
[7] Few-Shot Fine-Grained Image Classification via GNN
Zhou, Xiangyu
Zhang, Yuhui
Wei, Qianru
SENSORS, 2022, 22 (19)
[8] Variational Feature Disentangling for Fine-Grained Few-Shot Classification
Xu, Jingyi
Le, Hieu
Huang, Mingzhen
Athar, ShahRukh
Samaras, Dimitris
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8792 - 8801
[9] Few-Shot Fine-Grained Image Classification: A Comprehensive Review
Ren, Jie
Li, Changmiao
An, Yaohui
Zhang, Weichuan
Sun, Changming
AI, 2024, 5 (01) : 405 - 425
[10] Structural Subspace Learning for Few-shot Fine-grained Recognition
Li, Linjia
Deng, Jin
Huang, Ying
Chen, Yanyan
Luo, Wei
2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 693 - 699

← 1 2 3 4 5 →