Accurate, data-efficient, unconstrained text recognition with convolutional neural networks

被引:58
|
作者
Yousef, Mohamed [1 ]
Hussain, Khaled F. [1 ]
Mohammed, Usama S. [2 ]
机构
[1] Assiut Univ, Fac Comp & Informat, Comp Sci Dept, Asyut 71515, Egypt
[2] Assiut Univ, Elect Engn Dept, Fac Engn, Asyut 71515, Egypt
关键词
Text recognition; Optical character recognition; Handwriting recognition; CAPTCHA Solving; License plate recognition; Convolutional neural network; Deep learning; SCENE TEXT; LSTM;
D O I
10.1016/j.patcog.2020.107482
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unconstrained text recognition is an important computer vision task, featuring a wide variety of different sub-tasks, each with its own set of challenges. One of the biggest promises of deep neural networks has been the convergence and automation of feature extractors from input raw signals, allowing for the highest possible performance with minimum required domain knowledge. To this end, we propose a data-efficient, end-to-end neural network model for generic, unconstrained text recognition. In our proposed architecture we strive for simplicity and efficiency without sacrificing recognition accuracy. Our proposed architecture is a fully convolutional network without any recurrent connections trained with the CTC loss function. Thus it operates on arbitrary input sizes and produces strings of arbitrary length in a very efficient and parallelizable manner. We show the generality and superiority of our proposed text recognition architecture by achieving state-of-the-art results on seven public benchmark datasets, covering a wide spectrum of text recognition tasks, namely: Handwriting Recognition, CAPTCHA recognition, OCR, License Plate Recognition, and Scene Text Recognition. Our proposed architecture has won the ICFHR2018 Competition on Automated Text Recognition on a READ Dataset. (C) 2020 Published by Elsevier Ltd.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Convolutional Attention Networks for Scene Text Recognition
    Xie, Hongtao
    Fang, Shancheng
    Zha, Zheng-Jun
    Yang, Yating
    Li, Yan
    Zhang, Yongdong
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [42] A Data-Efficient Method for One-Shot Text Classification
    Wang, Hsin-Yang
    Liu, Mu
    Yamashita, Katsushi
    Okamoto, Yasuhiro
    Yamada, Satoshi
    2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 76 - 80
  • [43] EarDA: Towards Accurate and Data-Efficient Earable Activity Sensing
    Lyu, Shengzhe
    Chen, Yongliang
    Duan, Di
    Jia, Renqi
    Xu, Weitao
    2024 IEEE COUPLING OF SENSING & COMPUTING IN AIOT SYSTEMS, CSCAIOT 2024, 2024, : 1 - 7
  • [44] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
    Lee, Chan Woo
    Song, Kyu Ye
    Jeong, Jihoon
    Choi, Woo Yong
    FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34
  • [45] Learning representational invariances for data-efficient action recognition
    Zou, Yuliang
    Choi, Jinwoo
    Wang, Qitong
    Huang, Jia-Bin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 227
  • [46] Data-Efficient Image Recognition with Contrastive Predictive Coding
    Henaff, Olivier J.
    Srinivas, Aravind
    De Fauw, Jeffrey
    Razavi, Ali
    Doersch, Carl
    Eslami, S. M. Ali
    van den Oord, Aaron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [47] Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks
    Wu, Xianyu
    Luo, Chao
    Zhang, Qian
    Zhou, Jiliu
    Yang, Hao
    Li, Yulian
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 61 (01): : 289 - 300
  • [48] Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition
    Wang, Fenglei
    Guo, Qiang
    Lei, Jun
    Zhang, Jun
    IET COMPUTER VISION, 2017, 11 (06) : 497 - 504
  • [49] Data-Efficient Training Strategies for Neural TTS Systems
    Prajwal, K. R.
    Jawahar, C., V
    CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 223 - 227
  • [50] Unconstrained ear recognition using deep neural networks
    Dodge, Samuel
    Mounsef, Jinane
    Karam, Lina
    IET BIOMETRICS, 2018, 7 (03) : 207 - 214