WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-Only Supervised Text Spotting

被引:0
|
作者
Wu, Jingjing [1 ]
Fang, Zhengyao [1 ]
Lyu, Pengyuan [2 ]
Zhang, Chengquan [2 ]
Chen, Fanglin [1 ]
Lu, Guangming [1 ]
Pei, Wenjie [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Transcription-only supervised text spotting; Weakly supervised cross-modality contrastive learning;
D O I
10.1007/978-3-031-72751-1_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastive Learning problem, and design a simple yet effective model dubbed WeCromCL that is able to detect each transcription in a scene image in a weakly supervised manner. Unlike typical methods for cross-modality contrastive learning that focus on modeling the holistic semantic correlation between an entire image and a text description, our WeCromCL conducts atomistic contrastive learning to model the character-wise appearance consistency between a text transcription and its correlated region in a scene image to detect an anchor point for the transcription in a weakly supervised manner. The detected anchor points by WeCromCL are further used as pseudo location labels to guide the learning of text spotting. Extensive experiments on four challenging benchmarks demonstrate the superior performance of our model over other methods. Code will be released.
引用
收藏
页码:289 / 306
页数:18
相关论文
共 50 条
  • [21] A Cross-Modality Contrastive Learning Method for Radar Jamming Recognition
    Dong, Ganggang
    Wang, Zixuan
    Liu, Hongwei
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [22] Weakly-Supervised Contrastive Learning for Unsupervised Object Discovery
    Lv, Yunqiu
    Zhang, Jing
    Barnes, Nick
    Dai, Yuchao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2689 - 2702
  • [23] Counterfactual contrastive learning for weakly supervised temporal sentence grounding
    Xu, Yenan
    Xu, Wanru
    Miao, Zhenjiang
    NEUROCOMPUTING, 2025, 624
  • [24] Weakly Supervised Temporal Action Localization Based on Contrastive Learning
    Hou Y.
    Li Y.
    Guo Z.
    Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology, 2023, 56 (01): : 73 - 80
  • [25] Consistent prototype contrastive learning for weakly supervised person search
    Lin, Huadong
    Yu, Xiaohan
    Zhang, Pengcheng
    Bai, Xiao
    Zheng, Jin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 105
  • [26] Weakly-supervised cross-contrastive learning network for image manipulation detection and localization
    Bai, Ruyi
    KNOWLEDGE-BASED SYSTEMS, 2025, 310
  • [27] Mining relational data from text: From strictly supervised to weakly supervised learning
    Zhang, Zhu
    INFORMATION SYSTEMS, 2008, 33 (03) : 300 - 314
  • [28] Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer
    Kittenplon, Yair
    Lavi, Inbal
    Fogel, Sharon
    Bar, Yarin
    Manmatha, R.
    Perona, Pietro
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4594 - 4603
  • [29] Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning
    Yang, Sean Bin
    Guo, Chenjuan
    Hu, Jilin
    Yang, Bin
    Tang, Jian
    Jensen, Christian S.
    2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2873 - 2885
  • [30] Towards Self-supervised Face Labeling via Cross-modality Association
    Lu, Chris Xiaoxuan
    Kan, Xuan
    Rosa, Stefano
    Du, Bowen
    Wen, Hongkai
    Markham, Andrew
    Trigoni, Niki
    PROCEEDINGS OF THE 15TH ACM CONFERENCE ON EMBEDDED NETWORKED SENSOR SYSTEMS (SENSYS'17), 2017,