Residual attention-based multi-scale script identification in scene text images

被引:0
|
作者
Ma M. [1 ]
Wang Q.-F. [1 ]
Huang S. [2 ]
Huang S. [2 ]
Goulermas Y. [3 ]
Huang K. [1 ]
机构
[1] Department of Intelligent Science, School of Advanced Technology, Xi'an Jiaotong-Liverpool University, Suzhou
[2] Tencent Technology Co. Ltd, Beijing
[3] Department of Computer Science, University of Liverpool, Liverpool
基金
中国国家自然科学基金;
关键词
Attention mechanism; Feature fusion; Global max pooling; Multi-scale features; Script identification;
D O I
10.1016/j.neucom.2020.09.015
中图分类号
学科分类号
摘要
Script identification is an essential step in the text extraction pipeline for multi-lingual application. This paper presents an effective approach to identify scripts in scene text images. Due to the complicated background, various text styles, character similarity of different languages, script identification has not been solved yet. Under the general classification framework of script identification, we investigate two important components: feature extraction and classification layer. In the feature extraction, we utilize a hierarchical feature fusion block to extract the multi-scale features. Furthermore, we adopt an attention mechanism to obtain the local discriminative parts of feature maps. In the classification layer, we utilize a fully convolutional classifier to generate channel-level classifications which are then processed by a global pooling layer to improve classification efficiency. We evaluated the proposed approach on benchmark datasets of RRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, and the experimental results show the effectiveness of each elaborate designed component. Finally, we achieve better performances than those competitive models, where the correct rates are 89.66%, 96.11%, 98.78% and 97.20% on PRC-MLT2017, SIW-13, CVSI-2015 and MLe2e, respectively. © 2020 Elsevier B.V.
引用
收藏
页码:222 / 233
页数:11
相关论文
共 50 条
  • [41] Attention-Based Multi-Scale Prediction Network for Time-Series Data
    Junjie Li
    Lin Zhu
    Yong Zhang
    Da Guo
    Xingwen Xia
    China Communications, 2022, 19 (05) : 286 - 301
  • [42] Attention-based Frequency-aware Multi-scale Network for Sequential Recommendation
    Zhang, Yichi
    Yin, Guisheng
    Dong, Hongbin
    Zhang, Liguo
    APPLIED SOFT COMPUTING, 2022, 127
  • [43] Text Detection Algorithm Based on Multi-Scale Attention Feature Fusion
    She, Xiangyang
    Liu, Zhe
    Dong, Lihong
    Computer Engineering and Applications, 2024, 60 (01) : 198 - 206
  • [44] A Multi-scale Attention-based Facial Emotion Recognition Method Based on Deep Learning
    ZHANG Ning
    ZHANG Xiufeng
    FU Xingkui
    QI Guobin
    Instrumentation, 2022, 9 (03) : 51 - 58
  • [45] A Multi-scale Convolutional Attention Based GRU Network for Text Classification
    Tang, Xianlun
    Chen, Yingjie
    Dai, Yuyan
    Xu, Jin
    Peng, Deguang
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 3009 - 3013
  • [46] Attention-based multi-scale feature fusion for free-space detection
    Song, Pengfei
    Fan, Hui
    Li, Jinjiang
    Hua, Feng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2022, 16 (09) : 1222 - 1235
  • [47] Script Identification from Camera-Captured Multi-script Scene Text Components
    Jajoo, Madhuram
    Chakraborty, Neelotpal
    Mollah, Ayatullah Faruk
    Basu, Subhadip
    Sarkar, Ram
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 159 - 166
  • [48] Multi-Scale Attention-Based Deep Neural Network for Brain Disease Diagnosis
    Liang, Yin
    Xu, Gaoxu
    ur Rehman, Sadaqat
    Computers, Materials and Continua, 2022, 72 (03): : 4545 - 4661
  • [49] Attention-based Multi-scale Transfer ResNet for Skull Fracture Image Classification
    Ning, Dunbo
    Liu, Gang
    Jiang, Rifeng
    Wang, Chuyi
    FOURTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2019, 11198
  • [50] Multi-script text versus non-text classification of regions in scene images
    Sriman, Bowornrat
    Schomaker, Lambert
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 23 - 42