Enhanced Attention-Based Encoder-Decoder Framework for Text Recognition

被引:3
|
作者
Prabu, S. [1 ]
Sundar, K. Joseph Abraham [1 ]
机构
[1] SASTRA Univ, Sch Comp, Thanjavur 613401, India
来源
关键词
Deep learning; text recognition; text normalization; attention mechanism; convolutional neural network (CNN); SCENE TEXT; NEURAL-NETWORK;
D O I
10.32604/iasc.2023.029105
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recognizing irregular text in natural images is a challenging task in computer vision. The existing approaches still face difficulties in recognizing irre-gular text because of its diverse shapes. In this paper, we propose a simple yet powerful irregular text recognition framework based on an encoder-decoder archi-tecture. The proposed framework is divided into four main modules. Firstly, in the image transformation module, a Thin Plate Spline (TPS) transformation is employed to transform the irregular text image into a readable text image. Sec-ondly, we propose a novel Spatial Attention Module (SAM) to compel the model to concentrate on text regions and obtain enriched feature maps. Thirdly, a deep bi-directional long short-term memory (Bi-LSTM) network is used to make a con-textual feature map out of a visual feature map generated from a Convolutional Neural Network (CNN). Finally, we propose a Dual Step Attention Mechanism (DSAM) integrated with the Connectionist Temporal Classification (CTC) -Attention decoder to re-weights visual features and focus on the intra-sequence relationships to generate a more accurate character sequence. The effectiveness of our proposed framework is verified through extensive experiments on various benchmarks datasets, such as SVT, ICDAR, CUTE80, and IIIT5k. The perfor-mance of the proposed text recognition framework is analyzed with the accuracy metric. Demonstrate that our proposed method outperforms the existing approaches on both regular and irregular text. Additionally, the robustness of our approach is evaluated using the grocery datasets, such as GroZi-120, Web -Market, SKU-110K, and Freiburg Groceries datasets that contain complex text images. Still, our framework produces superior performance on grocery datasets.
引用
收藏
页码:2071 / 2086
页数:16
相关论文
共 50 条
  • [1] Attention-based encoder-decoder networks for workflow recognition
    Min Zhang
    Haiyang Hu
    Zhongjin Li
    Jie Chen
    [J]. Multimedia Tools and Applications, 2021, 80 : 34973 - 34995
  • [2] Attention-based encoder-decoder networks for workflow recognition
    Zhang, Min
    Hu, Haiyang
    Li, Zhongjin
    Chen, Jie
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 34973 - 34995
  • [3] Recognition of Japanese historical text lines by an attention-based encoder-decoder and text line generation
    Le, Anh Duc
    Mochihashi, Daichi
    Masuda, Katsuya
    Mima, Hideki
    Ly, Nam Tuan
    [J]. PROCEEDINGS OF THE 2019 WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING (HIP' 19), 2019, : 37 - 41
  • [4] Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition
    Cui, Mengmeng
    Wang, Wei
    Zhang, Jinjin
    Wang, Liang
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 156 - 170
  • [5] Natural Scene Text Recognition Based on Encoder-Decoder Framework
    Zuo, Ling-Qun
    Sun, Hong-Mei
    Mao, Qi-Chao
    Qi, Rong
    Jia, Rui-Sheng
    [J]. IEEE ACCESS, 2019, 7 : 62616 - 62623
  • [6] AttentionHTR: Handwritten Text Recognition Based on Attention Encoder-Decoder Networks
    Kass, Dmitrijs
    Vats, Ekta
    [J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 507 - 522
  • [7] An attention-based row-column encoder-decoder model for text recognition in Japanese historical documents
    Ly, Nam Tuan
    Nguyen, Cuong Tuan
    Nakagawa, Masaki
    [J]. PATTERN RECOGNITION LETTERS, 2020, 136 : 134 - 141
  • [8] PIEED: Position information enhanced encoder-decoder framework for scene text recognition
    Xitao Ma
    Kai He
    Dazhuang Zhang
    Dashuang Li
    [J]. Applied Intelligence, 2021, 51 : 6698 - 6707
  • [9] PIEED: Position information enhanced encoder-decoder framework for scene text recognition
    Ma, Xitao
    He, Kai
    Zhang, Dazhuang
    Li, Dashuang
    [J]. APPLIED INTELLIGENCE, 2021, 51 (10) : 6698 - 6707
  • [10] Multivariate time series forecasting via attention-based encoder-decoder framework
    Du, Shengdong
    Li, Tianrui
    Yang, Yan
    Horng, Shi-Jinn
    [J]. NEUROCOMPUTING, 2020, 388 : 269 - 279