WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-Only Supervised Text Spotting

被引:0
|
作者
Wu, Jingjing [1 ]
Fang, Zhengyao [1 ]
Lyu, Pengyuan [2 ]
Zhang, Chengquan [2 ]
Chen, Fanglin [1 ]
Lu, Guangming [1 ]
Pei, Wenjie [1 ]
机构
[1] Harbin Inst Technol, Shenzhen, Peoples R China
[2] Baidu Inc, Dept Comp Vis Technol, Beijing, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
Transcription-only supervised text spotting; Weakly supervised cross-modality contrastive learning;
D O I
10.1007/978-3-031-72751-1_17
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastive Learning problem, and design a simple yet effective model dubbed WeCromCL that is able to detect each transcription in a scene image in a weakly supervised manner. Unlike typical methods for cross-modality contrastive learning that focus on modeling the holistic semantic correlation between an entire image and a text description, our WeCromCL conducts atomistic contrastive learning to model the character-wise appearance consistency between a text transcription and its correlated region in a scene image to detect an anchor point for the transcription in a weakly supervised manner. The detected anchor points by WeCromCL are further used as pseudo location labels to guide the learning of text spotting. Extensive experiments on four challenging benchmarks demonstrate the superior performance of our model over other methods. Code will be released.
引用
收藏
页码:289 / 306
页数:18
相关论文
共 50 条
  • [1] Weakly supervised segmentation with cross-modality equivariant constraints
    Patel, Gaurav
    Dolz, Jose
    MEDICAL IMAGE ANALYSIS, 2022, 77
  • [2] Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering
    You, Chenyu
    Chen, Nuo
    Zou, Yuexian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 28 - 39
  • [3] Weakly Supervised Contrastive Learning
    Zheng, Mingkai
    Wang, Fei
    You, Shan
    Qian, Chen
    Zhang, Changshui
    Wang, Xiaogang
    Xu, Chang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10022 - 10031
  • [4] Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment Localization
    Lv, Zezhong
    Su, Bing
    Wen, Ji-Rong
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6539 - 6547
  • [5] Inter-Intra Cross-Modality Self-Supervised Video Representation Learning by Contrastive Clustering
    Wei, Jiutong
    Luo, Guan
    Li, Bing
    Hu, Weiming
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4815 - 4821
  • [6] CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
    Qin, Yulei
    Chen, Xingyu
    Shen, Yunhang
    Fu, Chaoyou
    Gu, Yun
    Li, Ke
    Sun, Xing
    Ji, Rongrong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] You Can even Annotate Text with Voice: Transcription-only-Supervised Text Spotting
    Tang, Jingqun
    Qiao, Su
    Cui, Benlei
    Ma, Yuhang
    Zhang, Sheng
    Kanoulas, Dimitrios
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4154 - 4163
  • [8] Self-supervised Feature Learning by Cross-modality and Cross-view Correspondences
    Jing, Longlong
    Zhang, Ling
    Tian, Yingli
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1581 - 1591
  • [9] Contrastive Image Synthesis and Self-supervised Feature Adaptation for Cross-Modality Biomedical Image Segmentation
    Hu, Xinrong
    Wang, Corey
    Shi, Yiyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2329 - 2338
  • [10] Weakly-Supervised Text-driven Contrastive Learning for Facial Behavior Understanding
    Zhang, Xiang
    Wang, Taoyue
    Li, Xiaotian
    Yang, Huiyuan
    Yin, Lijun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 20694 - 20705