End-to-end scene text recognition using tree-structured models

被引:29
|
作者
Shi, Cunzhao [1 ]
Wang, Chunheng [1 ]
Xiao, Baihua [1 ]
Gao, Song [1 ]
Hu, Jinlong [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
End-to-end; Scene text recognition; Part-based tree-structured models (TSMs); Normalized pictorial structure; SEGMENTATION; DETECT;
D O I
10.1016/j.patcog.2014.03.023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting and recognizing text in natural images are quite challenging and have received much attention from the computer vision community in recent years. In this paper, we propose a robust end-to-end scene text recognition method, which utilizes tree-structured character models and normalized pictorial structured word models. For each category of characters, we build a part-based tree-structured model (TSM) so as to make use of the character-specific structure information as well as the local appearance information. The TSM could detect each part of the character and recognize the unique structure as well, seamlessly combining character detection and recognition together. As the TSMs could accurately detect characters from complex background, for text localization, we apply TSMs for all the characters on the coarse text detection regions to eliminate the false positives and search the possible missing characters as well. While for word recognition, we propose a normalized pictorial structure (PS) framework to deal with the bias caused by words of different lengths. Experimental results on a range of challenging public datasets (ICDAR 2003, ICDAR 2011, SVT) demonstrate that the proposed method outperforms state-of-the-art methods both for text localization and word recognition. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2853 / 2866
页数:14
相关论文
共 50 条
  • [1] End-to-End Scene Text Recognition
    Wang, Kai
    Babenko, Boris
    Belongie, Serge
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2011, : 1457 - 1464
  • [2] An End-to-End Scene Text Recognition for Bilingual Text
    Albalawi, Bayan M.
    Jamal, Amani T.
    Al Khuzayem, Lama A.
    Alsaedi, Olaa A.
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (09)
  • [3] A framework for end-to-end learning on semantic tree-structured data
    Woof, William
    Chen, Ke
    [J]. arXiv, 2020,
  • [4] Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
    Zeng, Pengpeng
    Zhu, Jinkuan
    Song, Jingkuan
    Gao, Lianli
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5210 - 5218
  • [5] Transformer-based end-to-end scene text recognition
    Zhu, Xinghao
    Zhang, Zhi
    [J]. PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
  • [6] End-to-End Scene Text Recognition with Character Centroid Prediction
    Zhao, Wei
    Ma, Jinwen
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 291 - 299
  • [7] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
    Hao, Jiedong
    Wen, Yafei
    Deng, Jie
    Gan, Jun
    Ren, Shuai
    Tan, Hui
    Chen, Xiaoxin
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
  • [8] Scene Text Recognition using Part-based Tree-structured Character Detection
    Shi, Cunzhao
    Wang, Chunheng
    Xiao, Baihua
    Zhang, Yang
    Gao, Song
    Zhang, Zhong
    [J]. 2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2961 - 2968
  • [9] Person Re-identification with End-to-End Scene Text Recognition
    Kamlesh
    Xu, Pei
    Yang, Yang
    Xu, Yongchao
    [J]. COMPUTER VISION, PT III, 2017, 773 : 363 - 374
  • [10] An end-to-end model for multi-view scene text recognition
    Banerjee, Ayan
    Shivakumara, Palaiahnakote
    Bhattacharya, Saumik
    Pal, Umapada
    Liu, Cheng-Lin
    [J]. PATTERN RECOGNITION, 2024, 149