Accurate, data-efficient, unconstrained text recognition with convolutional neural networks

被引：58

作者：

Yousef, Mohamed ^{[1
]}

Hussain, Khaled F. ^{[1
]}

Mohammed, Usama S. ^{[2
]}

机构：

[1] Assiut Univ, Fac Comp & Informat, Comp Sci Dept, Asyut 71515, Egypt

[2] Assiut Univ, Elect Engn Dept, Fac Engn, Asyut 71515, Egypt

来源：

PATTERN RECOGNITION | 2020年 / 108卷 / 108期

关键词：

Text recognition; Optical character recognition; Handwriting recognition; CAPTCHA Solving; License plate recognition; Convolutional neural network; Deep learning; SCENE TEXT; LSTM;

D O I：

10.1016/j.patcog.2020.107482

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Unconstrained text recognition is an important computer vision task, featuring a wide variety of different sub-tasks, each with its own set of challenges. One of the biggest promises of deep neural networks has been the convergence and automation of feature extractors from input raw signals, allowing for the highest possible performance with minimum required domain knowledge. To this end, we propose a data-efficient, end-to-end neural network model for generic, unconstrained text recognition. In our proposed architecture we strive for simplicity and efficiency without sacrificing recognition accuracy. Our proposed architecture is a fully convolutional network without any recurrent connections trained with the CTC loss function. Thus it operates on arbitrary input sizes and produces strings of arbitrary length in a very efficient and parallelizable manner. We show the generality and superiority of our proposed text recognition architecture by achieving state-of-the-art results on seven public benchmark datasets, covering a wide spectrum of text recognition tasks, namely: Handwriting Recognition, CAPTCHA recognition, OCR, License Plate Recognition, and Scene Text Recognition. Our proposed architecture has won the ICFHR2018 Competition on Automated Text Recognition on a READ Dataset. (C) 2020 Published by Elsevier Ltd.

引用

页数：12

共 50 条

[41] Convolutional Attention Networks for Scene Text Recognition
Xie, Hongtao
Fang, Shancheng
Zha, Zheng-Jun
Yang, Yating
Li, Yan
Zhang, Yongdong
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
[42] A Data-Efficient Method for One-Shot Text Classification
Wang, Hsin-Yang
Liu, Mu
Yamashita, Katsushi
Okamoto, Yasuhiro
Yamada, Satoshi
2022 IEEE 2ND INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE (CCAI 2022), 2022, : 76 - 80
[43] EarDA: Towards Accurate and Data-Efficient Earable Activity Sensing
Lyu, Shengzhe
Chen, Yongliang
Duan, Di
Jia, Renqi
Xu, Weitao
2024 IEEE COUPLING OF SENSING & COMPUTING IN AIOT SYSTEMS, CSCAIOT 2024, 2024, : 1 - 7
[44] Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
Lee, Chan Woo
Song, Kyu Ye
Jeong, Jihoon
Choi, Woo Yong
FIRST GRAND CHALLENGE AND WORKSHOP ON HUMAN MULTIMODAL LANGUAGE (CHALLENGE-HML), 2018, : 28 - 34
[45] Learning representational invariances for data-efficient action recognition
Zou, Yuliang
Choi, Jinwoo
Wang, Qitong
Huang, Jia-Bin
COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 227
[46] Data-Efficient Image Recognition with Contrastive Predictive Coding
Henaff, Olivier J.
Srinivas, Aravind
De Fauw, Jeffrey
Razavi, Ali
Doersch, Carl
Eslami, S. M. Ali
van den Oord, Aaron
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[47] Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks
Wu, Xianyu
Luo, Chao
Zhang, Qian
Zhou, Jiliu
Yang, Hao
Li, Yulian
CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 61 (01): : 289 - 300
[48] Convolutional recurrent neural networks with hidden Markov model bootstrap for scene text recognition
Wang, Fenglei
Guo, Qiang
Lei, Jun
Zhang, Jun
IET COMPUTER VISION, 2017, 11 (06) : 497 - 504
[49] Data-Efficient Training Strategies for Neural TTS Systems
Prajwal, K. R.
Jawahar, C., V
CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 223 - 227
[50] Unconstrained ear recognition using deep neural networks
Dodge, Samuel
Mounsef, Jinane
Karam, Lina
IET BIOMETRICS, 2018, 7 (03) : 207 - 214

← 1 2 3 4 5 →