Context-Based Contrastive Learning for Scene Text Recognition

被引:0
|
作者
Zhang, Xinyun [1 ]
Zhu, Binwu [1 ]
Yao, Xufeng [1 ]
Sun, Qi [1 ]
Li, Ruiyu [2 ]
Yu, Bei [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] SmartMore, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pursuing accurate and robust recognizers has been a long-lasting goal for scene text recognition (STR) researchers. Recently, attention-based methods have demonstrated their effectiveness and achieved impressive results on public benchmarks. The attention mechanism enables models to recognize scene text with severe visual distortions by leveraging contextual information. However, recent studies revealed that the implicit over-reliance of context leads to catastrophic out-of-vocabulary performance. On the contrary to the superior accuracy of the seen text, models are prone to misrecognize unseen text even with good image quality. We propose a novel framework, Context-based contrastive learning (ConCLR), to alleviate this issue. Our proposed method first generates characters with different contexts via simple image concatenation operations and then optimizes contrastive loss on their embeddings. By pulling together clusters of identical characters within various contexts and pushing apart clusters of different characters in embedding space, ConCLR suppresses the side-effect of overfitting to specific contexts and learns a more robust representation. Experiments show that ConCLR significantly improves out-of-vocabulary generalization and achieves state-of-the-art performance on public benchmarks together with attention-based recognizers.
引用
收藏
页码:3353 / 3361
页数:9
相关论文
共 50 条
  • [1] Relational Contrastive Learning for Scene Text Recognition
    Zhang, Jinglei
    Lin, Tiancheng
    Xu, Yi
    Chen, Kai
    Zhang, Rui
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5764 - 5775
  • [2] Text-Level Contrastive Learning for Scene Text Recognition
    Zhuang, Junbin
    Ren, Yixuan
    Li, Xia
    Liang, Zhanpeng
    [J]. 2022 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2022), 2022, : 231 - 236
  • [3] Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition
    Liu, Hao
    Wang, Bin
    Bao, Zhimin
    Xue, Mobai
    Kang, Sheng
    Jiang, Deqiang
    Liu, Yinsong
    Ren, Bo
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1702 - 1710
  • [4] Hierarchical Context-Based Emotion Recognition With Scene Graphs
    Wu, Shichao
    Zhou, Lei
    Hu, Zhengxi
    Liu, Jingtai
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3725 - 3739
  • [5] Context-based environmental audio event recognition for scene understanding
    Lu, Tong
    Wang, Gongyou
    Su, Feng
    [J]. MULTIMEDIA SYSTEMS, 2015, 21 (05) : 507 - 524
  • [6] Context-based environmental audio event recognition for scene understanding
    Tong Lu
    Gongyou Wang
    Feng Su
    [J]. Multimedia Systems, 2015, 21 : 507 - 524
  • [7] Context-Based Scene Understanding
    Zolghadr, Esfandiar
    Furht, Borko
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2016, 7 (01): : 22 - 40
  • [8] Masked and Permuted Implicit Context Learning for Scene Text Recognition
    Yang, Xiaomeng
    Qiao, Zhi
    Wei, Jin
    Yang, Dongbao
    Zhou, Yu
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 964 - 968
  • [9] A New Context-Based Method for Restoring Occluded Text in Natural Scene Images
    Mittal, Ayush
    Shivakumara, Palaiahnakote
    Pal, Umapada
    Lu, Tong
    Blumenstein, Michael
    Lopresti, Daniel
    [J]. DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 : 466 - 480
  • [10] Spatial attention contrastive network for scene text recognition
    Wang, Fan
    Yin, Dong
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (04)