Reconsidering learnable fine-grained text prompts for few-shot anomaly detection in visual-language models

被引:0
|
作者
Han, Delong [1 ,2 ]
Xu, Luo [1 ,2 ]
Zhou, Mingle [1 ,2 ]
Wan, Jin [1 ,2 ]
Li, Min [1 ,2 ]
Li, Gang [1 ,2 ]
机构
[1] Qilu Univ Technol, Key Lab Comp Power Network & Informat Secur, Minist Educ,Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Acad Sci, Jinan 250353, Shandong, Peoples R China
[2] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Power Internet & Serv C, Jinan 250014, Peoples R China
关键词
Industrial anomaly detection; Fine-grained text prompts; Few-Shot Anomaly Detection; Pre-trained visual-language models;
D O I
10.1016/j.neunet.2024.106906
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-Shot Anomaly Detection (FSAD) in industrial images aims to identify abnormalities using only a few normal images, which is crucial for industrial scenarios where sample training is limited. The recent advances in large-scale pre-trained visual-language models have brought significant improvements to the FSAD, which typically requires hundreds of text prompts to be manually crafted through prompt engineering. However, manually designed text prompts cannot accurately match the informative features of different categories across diverse images, and the domain gap between train and test datasets can severely impact the generalization capability of text prompts. To address these issues, we propose a visual-language model based on fine-grained learnable text prompts as a unified general framework for FSAD in industry. Firstly, we design a Fine-grained Text Prompts Adapter (FTPA) and an associated registration loss to enhance the efficiency of text prompts. The manually designed text prompts are improved and optimized by capturing normal and abnormal semantic information in the image, so that the text prompts can describe the image semantic information at a finer granularity. In addition, we introduce a Dynamic Modulation Mechanism (DMM) to avoid potential errors in text prompts post-training due to the agnostic during cross-dataset detection. This is achieved by explicitly modulating the branch guided by few-shot images and the branch guided by fine-grained text prompts. Extensive experiments demonstrate that our proposed method achieves state-of-the-art few-shot industrial anomaly detection and segmentation performance. In the 4-shot, the AUROC of the anomaly classification and anomaly segmentation achieves 98.3%, 96.3%, and 93.8%, 97.9% on the MVTec-AD and VisA datasets, respectively.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Task-Oriented Channel Attention for Fine-Grained Few-Shot Classification
    Lee, Subeen
    Moon, Wonjun
    Seong, Hyun Seok
    Heo, Jae-Pil
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1448 - 1463
  • [32] Attentive fine-grained recognition for cross-domain few-shot classification
    Sa, Liangbing
    Yu, Chongchong
    Ma, Xianqin
    Zhao, Xia
    Xie, Tao
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (06): : 4733 - 4746
  • [33] Few-Shot Learning for Domain-Specific Fine-Grained Image Classification
    Sun, Xin
    Xv, Hongwei
    Dong, Junyu
    Zhou, Huiyu
    Chen, Changrui
    Li, Qiong
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (04) : 3588 - 3598
  • [34] Dual adaptive local semantic alignment for few-shot fine-grained classification
    Song, Wei
    Yang, Kaili
    VISUAL COMPUTER, 2025, 41 (04): : 2923 - 2937
  • [35] Attentive fine-grained recognition for cross-domain few-shot classification
    Liangbing Sa
    Chongchong Yu
    Xianqin Ma
    Xia Zhao
    Tao Xie
    Neural Computing and Applications, 2022, 34 : 4733 - 4746
  • [36] An Unbiased Feature Estimation Network for Few-Shot Fine-Grained Image Classification
    Wang, Jiale
    Lu, Jin
    Yang, Junpo
    Wang, Meijia
    Zhang, Weichuan
    SENSORS, 2024, 24 (23)
  • [37] Transformer-Based Few-Shot and Fine-Grained Image Classification Method
    Lu, Yan
    Wang, Yangping
    Wang, Wenrun
    Computer Engineering and Applications, 2023, 59 (23) : 219 - 227
  • [38] Learning a compact embedding for fine-grained few-shot static gesture recognition
    Hu, Zhipeng
    Qiu, Feng
    Sun, Haodong
    Zhang, Wei
    Ding, Yu
    Lv, Tangjie
    Fan, Changjie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (33) : 79009 - 79028
  • [39] Task-specific Part Discovery for Fine-grained Few-shot Classification
    Wei, Yongxian
    Wei, Xiu-Shen
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (05) : 954 - 965
  • [40] Fine-Grained Few-Shot Image Classification Based on Feature Dual Reconstruction
    Liu, Shudong
    Zhong, Wenlong
    Guo, Furong
    Cong, Jia
    Gu, Boyu
    ELECTRONICS, 2024, 13 (14)