WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-Only Supervised Text Spotting

被引:0
|
作者
Wu, Jingjing [1 ]
Fang, Zhengyao [1 ]
Lyu, Pengyuan [2 ]
Zhang, Chengquan [2 ]
Chen, Fanglin [1 ]
Lu, Guangming [1 ]
Pei, Wenjie [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Transcription-only supervised text spotting; Weakly supervised cross-modality contrastive learning;
D O I
10.1007/978-3-031-72751-1_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastive Learning problem, and design a simple yet effective model dubbed WeCromCL that is able to detect each transcription in a scene image in a weakly supervised manner. Unlike typical methods for cross-modality contrastive learning that focus on modeling the holistic semantic correlation between an entire image and a text description, our WeCromCL conducts atomistic contrastive learning to model the character-wise appearance consistency between a text transcription and its correlated region in a scene image to detect an anchor point for the transcription in a weakly supervised manner. The detected anchor points by WeCromCL are further used as pseudo location labels to guide the learning of text spotting. Extensive experiments on four challenging benchmarks demonstrate the superior performance of our model over other methods. Code will be released.
引用
收藏
页码:289 / 306
页数:18
相关论文
共 50 条
  • [31] Negative Prototypes Guided Contrastive Learning for Weakly Supervised Object Detection
    Zhang, Yu
    Zhu, Chuang
    Yang, Guoqing
    Chen, Siqi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT II, 2023, 14170 : 36 - 51
  • [32] Instance-Level Contrastive Learning for Weakly Supervised Object Detection
    Zhang, Ming
    Zeng, Bing
    SENSORS, 2022, 22 (19)
  • [33] Object Discovery via Contrastive Learning for Weakly Supervised Object Detection
    Seo, Jinhwan
    Bae, Wonho
    Sutherland, Danica J.
    Noh, Junhyug
    Kim, Daijin
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 312 - 329
  • [34] Weakly-Supervised Positional Contrastive Learning: Application to Cirrhosis Classification
    Sarfati, Emma
    Bone, Alexandre
    Rohe, Marc-Michel
    Gori, Pietro
    Bloch, Isabelle
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT I, 2023, 14220 : 227 - 237
  • [35] Quantitative Identification of Driver Distraction: A Weakly Supervised Contrastive Learning Approach
    Yang, Haohan
    Liu, Haochen
    Hu, Zhongxu
    Nguyen, Anh-Tu
    Guerra, Thierry-Marie
    Lv, Chen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (02) : 2034 - 2045
  • [36] Dual-branch contrastive learning for weakly supervised object localization
    Guo, Zebin
    Li, Dong
    Du, Zhengjun
    Seng, Bingfeng
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [37] Dual-branch contrastive learning for weakly supervised object localizationDual-branch contrastive learning for weakly supervised object localizationZ. Guo et al.
    Zebin Guo
    Dong Li
    Zhengjun Du
    Bingfeng Seng
    Applied Intelligence, 2025, 55 (7)
  • [38] Recalibrated cross-modal alignment network for radiology report generation with weakly supervised contrastive learning
    Hou, Xiaodi
    Li, Xiaobo
    Liu, Zhi
    Sang, Shengtian
    Lu, Mingyu
    Zhang, Yijia
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [39] Deep Text Prior: Weakly Supervised Learning for Assertion Classification
    Liventsev, Vadim
    Fedulova, Irina
    Dylov, Dmitry
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2019: WORKSHOP AND SPECIAL SESSIONS, 2019, 11731 : 243 - 257
  • [40] MSL-CCRN: Multi-stage self-supervised learning based cross-modality contrastive representation network for infrared and visible image fusion
    Yan, Zhilin
    Nie, Rencan
    Cao, Jinde
    Xie, Guangxu
    Ding, Zhengze
    DIGITAL SIGNAL PROCESSING, 2025, 156