CrackCLIP: Adapting Vision-Language Models for Weakly Supervised Crack Segmentation

被引:0
|
作者
Liang, Fengjiao [1 ]
Li, Qingyong [1 ,2 ]
Yu, Haomin [3 ]
Wang, Wen [1 ]
机构
[1] Beijing Jiaotong Univ, Key Lab Big Data & Artificial Intelligence Transpo, Minist Educ, Beijing 100044, Peoples R China
[2] Beijing Jiaotong Univ, Frontiers Sci Ctr Smart High Speed Railway Syst, Beijing 100044, Peoples R China
[3] Aalborg Univ, Dept Comp Sicence, DK-9200 Aalborg, Denmark
基金
北京市自然科学基金;
关键词
weakly supervised crack segmentation; vision-language model; Contrastive Language-Image Pre-Training;
D O I
10.3390/e27020127
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Weakly supervised crack segmentation aims to create pixel-level crack masks with minimal human annotation, which often only differentiate between crack and normal no-crack patches. This task is crucial for assessing structural integrity and safety in real-world industrial applications, where manually labeling the location of cracks at the pixel level is both labor-intensive and impractical. Addressing the challenges of labeling uncertainty, this paper presents CrackCLIP, a novel approach that leverages language prompts to augment the semantic context and employs the Contrastive Language-Image Pre-Training (CLIP) model to enhance weakly supervised crack segmentation. Initially, a gradient-based class activation map is used to generate pixel-level coarse pseudo-labels from a trained crack patch classifier. The estimated coarse pseudo-labels are utilized to fine-tune additional linear adapters, which are integrated into the frozen image encoders of CLIP to adapt the CLIP model to the specialized task of crack segmentation. Moreover, specific textual prompts are crafted for crack characteristics, which are input into the frozen text encoder of CLIP to extract features encapsulating the semantic essence of the cracks. The final crack segmentation is determined by comparing the similarity between text prompt features and visual patch token features. Comparative experiments on the Crack500, CFD, and DeepCrack datasets demonstrate that the proposed framework outperforms existing weakly supervised crack segmentation methods, and the pre-trained vision-language model exhibits strong potential for crack feature learning, thereby enhancing the overall performance and generalization capabilities of the proposed framework.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
    Wu, Peng
    Zhou, Xuerong
    Pang, Guansong
    Zhou, Lingru
    Yan, Qingsen
    Wang, Peng
    Zhang, Yanning
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6074 - 6082
  • [2] Weakly Supervised Grounding for VQA in Vision-Language Transformers
    Khan, Aisha Urooj
    Kuehne, Hilde
    Gan, Chuang
    Lobo, Niels Da Vitoria
    Shah, Mubarak
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 652 - 670
  • [3] Adapting vision-language AI models to cardiology tasks
    Arnaout, Rima
    NATURE MEDICINE, 2024, 30 (05) : 1245 - 1246
  • [4] 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
    Xu, Xiaoxu
    Yuan, Yitian
    Li, Jinlong
    Zhang, Qiudan
    Jie, Zequn
    Ma, Lin
    Tang, Hao
    Sebe, Nicu
    Wang, Xu
    COMPUTER VISION - ECCV 2024, PT LXXIII, 2025, 15131 : 87 - 104
  • [5] Adapting Vision-Language Models via Learning to Inject Knowledge
    Xuan, Shiyu
    Yang, Ming
    Zhang, Shiliang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5798 - 5809
  • [6] SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
    Hoyer, Lukas
    Tan, David Joseph
    Naeem, Muhammad Ferjad
    Van Gool, Luc
    Tombari, Federico
    COMPUTER VISION - ECCV 2024, PT XXXIX, 2025, 15097 : 257 - 275
  • [7] CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection
    Khan, Sohail Ahmed
    Duc-Tien Dang-Nguyen
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1006 - 1015
  • [8] Text Promptable Surgical Instrument Segmentation with Vision-Language Models
    Zhou, Zijian
    Alabi, Oluwatosin
    Wei, Meng
    Vercauteren, Tom
    Shi, Miaojing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Vision-Language Models for Vision Tasks: A Survey
    Zhang, Jingyi
    Huang, Jiaxing
    Jin, Sheng
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (08) : 5625 - 5644
  • [10] Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
    Xue, Chuhui
    Zhang, Wenqing
    Hao, Yu
    Lu, Shijian
    Torr, Philip H. S.
    Bai, Song
    COMPUTER VISION - ECCV 2022, PT XXVIII, 2022, 13688 : 284 - 302