Attention-based Text Recognition in the Wild

被引:0
|
作者
Yan, Zhi-Chen [1 ]
Yu, Stephanie A. [2 ]
机构
[1] Facebook Res, 1 Hacker Way, Menlo Pk, CA 94025 USA
[2] West Isl Sch, Pokfulam, 250 Victoria Rd, Hong Kong, Peoples R China
关键词
Attention; Convolution; Deep Learning; LSTM; Text Recognition;
D O I
10.5220/0009970200420049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 50 条
  • [1] Adaptive embedding gate for attention-based scene text recognition
    Chen, Xiaoxue
    Wang, Tianwei
    Zhu, Yuanzhi
    Jin, Lianwen
    Luo, Canjie
    [J]. NEUROCOMPUTING, 2020, 381 : 261 - 271
  • [2] Dynamic Receptive Field Adaptation for Attention-Based Text Recognition
    Qin, Haibo
    Yang, Chun
    Zhu, Xiaobin
    Yin, Xucheng
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 225 - 239
  • [3] Attention-Based Deep Learning Model for Arabic Handwritten Text Recognition
    Gader, Takwa Ben Aïcha
    Echi, Afef Kacem
    [J]. Machine Graphics and Vision, 2022, 31 (1-4): : 49 - 73
  • [4] Enhanced Attention-Based Encoder-Decoder Framework for Text Recognition
    Prabu, S.
    Sundar, K. Joseph Abraham
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 2071 - 2086
  • [5] An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition
    Alshawi, Adil Abdullah Abdulhussein
    Tanha, Jafar
    Balafar, Mohammad Ali
    [J]. IEEE ACCESS, 2024, 12 : 8123 - 8134
  • [6] STAN: A sequential transformation attention-based network for scene text recognition
    Lin, Qingxiang
    Luo, Canjie
    Jin, Lianwen
    Lai, Songxuan
    [J]. PATTERN RECOGNITION, 2021, 111
  • [7] Attention-Based Neural Text Segmentation
    Badjatiya, Pinkesh
    Kurisinkel, Litton J.
    Gupta, Manish
    Varma, Vasudeva
    [J]. ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 180 - 193
  • [8] Attention-Based Deep Neural Network and Its Application to Scene Text Recognition
    He, Haizhen
    Li, Jiehan
    [J]. 2019 IEEE 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2019), 2019, : 672 - 677
  • [9] Recognition of Japanese historical text lines by an attention-based encoder-decoder and text line generation
    Le, Anh Duc
    Mochihashi, Daichi
    Masuda, Katsuya
    Mima, Hideki
    Ly, Nam Tuan
    [J]. PROCEEDINGS OF THE 2019 WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING (HIP' 19), 2019, : 37 - 41
  • [10] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28