Reconsidering learnable fine-grained text prompts for few-shot anomaly detection in visual-language models

被引:0
|
作者
Han, Delong [1 ,2 ]
Xu, Luo [1 ,2 ]
Zhou, Mingle [1 ,2 ]
Wan, Jin [1 ,2 ]
Li, Min [1 ,2 ]
Li, Gang [1 ,2 ]
机构
[1] Qilu Univ Technol, Key Lab Comp Power Network & Informat Secur, Minist Educ,Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Shandong Acad Sci, Jinan 250353, Shandong, Peoples R China
[2] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Power Internet & Serv C, Jinan 250014, Peoples R China
关键词
Industrial anomaly detection; Fine-grained text prompts; Few-Shot Anomaly Detection; Pre-trained visual-language models;
D O I
10.1016/j.neunet.2024.106906
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Few-Shot Anomaly Detection (FSAD) in industrial images aims to identify abnormalities using only a few normal images, which is crucial for industrial scenarios where sample training is limited. The recent advances in large-scale pre-trained visual-language models have brought significant improvements to the FSAD, which typically requires hundreds of text prompts to be manually crafted through prompt engineering. However, manually designed text prompts cannot accurately match the informative features of different categories across diverse images, and the domain gap between train and test datasets can severely impact the generalization capability of text prompts. To address these issues, we propose a visual-language model based on fine-grained learnable text prompts as a unified general framework for FSAD in industry. Firstly, we design a Fine-grained Text Prompts Adapter (FTPA) and an associated registration loss to enhance the efficiency of text prompts. The manually designed text prompts are improved and optimized by capturing normal and abnormal semantic information in the image, so that the text prompts can describe the image semantic information at a finer granularity. In addition, we introduce a Dynamic Modulation Mechanism (DMM) to avoid potential errors in text prompts post-training due to the agnostic during cross-dataset detection. This is achieved by explicitly modulating the branch guided by few-shot images and the branch guided by fine-grained text prompts. Extensive experiments demonstrate that our proposed method achieves state-of-the-art few-shot industrial anomaly detection and segmentation performance. In the 4-shot, the AUROC of the anomaly classification and anomaly segmentation achieves 98.3%, 96.3%, and 93.8%, 97.9% on the MVTec-AD and VisA datasets, respectively.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Fine-Grained 3D-Attention Prototypes for Few-Shot Learning
    Hu, Xin
    Liu, Jun
    Ma, Jie
    Pan, Yudai
    Zhang, Lingling
    NEURAL COMPUTATION, 2020, 32 (09) : 1664 - 1684
  • [42] Video Pose Distillation for Few-Shot, Fine-Grained Sports Action Recognition
    Hong, James
    Fisher, Matthew
    Gharbi, Michael
    Fatahalian, Kayvon
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9234 - 9243
  • [43] Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models
    Logan, Robert L.
    Balazevic, Ivana
    Wallace, Eric
    Petroni, Fabio
    Singh, Sameer
    Riedel, Sebastian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2824 - 2835
  • [44] Few-Shot Fine-Grained Forest Fire Smoke Recognition Based on Metric Learning
    Sun, Bingjian
    Cheng, Pengle
    Huang, Ying
    SENSORS, 2022, 22 (21)
  • [45] Hierarchical few-shot learning based on coarse- and fine-grained relation network
    Zhiping Wu
    Hong Zhao
    Artificial Intelligence Review, 2023, 56 : 2011 - 2030
  • [46] A Task-Aware Dual Similarity Network for Fine-Grained Few-Shot Learning
    Qi, Yan
    Sun, Han
    Liu, Ningzhong
    Zhou, Huiyu
    PRICAI 2022: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2022, 13631 : 606 - 618
  • [47] Membership-Grade Based Prototype Rectification for Fine-Grained Few-Shot Classification
    Ning, Sa
    Qi, Rundong
    Jiang, Yong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT IX, 2023, 14262 : 13 - 24
  • [48] Global and Local Attention Embedding Network for Few-Shot Fine-Grained Image Classification
    Hu, Jiayuan
    Own, Chung-Ming
    Tao, Wenyuan
    WEB AND BIG DATA, PT I, APWEB-WAIM 2020, 2020, 12317 : 740 - 747
  • [49] Query-guided networks for few-shot fine-grained classification and person search
    Munjal, Bharti
    Flaborea, Alessandro
    Amin, Sikandar
    Tombari, Federico
    Galasso, Fabio
    PATTERN RECOGNITION, 2023, 133
  • [50] Part-Level Relationship Learning for Fine-Grained Few-Shot Image Classification
    Wang, Chuanming
    Fu, Huiyuan
    Liu, Peiye
    Ma, Huadong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1448 - 1460