Attention-based Text Recognition in the Wild

被引:0
|
作者
Yan, Zhi-Chen [1 ]
Yu, Stephanie A. [2 ]
机构
[1] Facebook Res, 1 Hacker Way, Menlo Pk, CA 94025 USA
[2] West Isl Sch, Pokfulam, 250 Victoria Rd, Hong Kong, Peoples R China
关键词
Attention; Convolution; Deep Learning; LSTM; Text Recognition;
D O I
10.5220/0009970200420049
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recognizing texts in real-world scenes is an important research topic in computer vision. Many deep learning based techniques have been proposed. Such techniques typically follow an encoder-decoder architecture, and use a sequence of feature vectors as the intermediate representation. In this approach, useful 2D spatial information in the input image may be lost due to vector-based encoding. In this paper, we formulate scene text recognition as a spatiotemporal sequence translation problem, and introduce a novel attention based spatiotemporal decoding framework. We first encode an image as a spatiotemporal sequence, which is then translated into a sequence of output characters using the aforementioned decoder. Our encoding and decoding stages are integrated to form an end-to-end trainable deep network. Experimental results on multiple benchmarks, including IIIT5k, SVT, ICDAR and RCTW-17, indicate that our method can significantly outperform conventional attention frameworks.
引用
收藏
页码:42 / 49
页数:8
相关论文
共 50 条
  • [1] Adaptive embedding gate for attention-based scene text recognition
    Chen, Xiaoxue
    Wang, Tianwei
    Zhu, Yuanzhi
    Jin, Lianwen
    Luo, Canjie
    NEUROCOMPUTING, 2020, 381 : 261 - 271
  • [2] Dynamic Receptive Field Adaptation for Attention-Based Text Recognition
    Qin, Haibo
    Yang, Chun
    Zhu, Xiaobin
    Yin, Xucheng
    DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 225 - 239
  • [3] Attention-Based Deep Learning Model for Arabic Handwritten Text Recognition
    Gader T.B.A.
    Echi A.K.
    Machine Graphics and Vision, 2022, 31 (1-4): : 49 - 73
  • [4] An Attention-Based Convolutional Recurrent Neural Networks for Scene Text Recognition
    Alshawi, Adil Abdullah Abdulhussein
    Tanha, Jafar
    Balafar, Mohammad Ali
    IEEE ACCESS, 2024, 12 : 8123 - 8134
  • [5] STAN: A sequential transformation attention-based network for scene text recognition
    Lin, Qingxiang
    Luo, Canjie
    Jin, Lianwen
    Lai, Songxuan
    PATTERN RECOGNITION, 2021, 111
  • [6] Enhanced Attention-Based Encoder-Decoder Framework for Text Recognition
    Prabu, S.
    Sundar, K. Joseph Abraham
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (02): : 2071 - 2086
  • [7] Attention-Based Neural Text Segmentation
    Badjatiya, Pinkesh
    Kurisinkel, Litton J.
    Gupta, Manish
    Varma, Vasudeva
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 180 - 193
  • [8] Attention-Based Deep Neural Network and Its Application to Scene Text Recognition
    He, Haizhen
    Li, Jiehan
    2019 IEEE 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2019), 2019, : 672 - 677
  • [9] Recognition of Japanese historical text lines by an attention-based encoder-decoder and text line generation
    Le, Anh Duc
    Mochihashi, Daichi
    Masuda, Katsuya
    Mima, Hideki
    Ly, Nam Tuan
    PROCEEDINGS OF THE 2019 WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING (HIP' 19), 2019, : 37 - 41
  • [10] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28