MuSIC: A Novel Multi-Scale Deep Neural Framework for Script Identification in the Wild

被引:0
|
作者
Khan, Tauseef [1 ]
Saif, Md. [2 ]
Mollah, Ayatullah Faruk [2 ]
机构
[1] VIT AP Univ, Sch Comp Sci & Engn, Amaravati 522237, Andhra Prades, India
[2] Aliah Univ, Dept Comp Sci & Engn, Kolkata 700160, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Convolutional neural networks; Image segmentation; Three-dimensional displays; Recurrent neural networks; Real-time systems; Accuracy; Visualization; Text recognition; Semantics; Digital images; Script identification; MuSIC; multi-scale; convolutional neural network; multi-script; LEVEL; CLASSIFICATION; FEATURES;
D O I
10.1109/ACCESS.2024.3494023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Script identification in digital images is crucial for automated text reading in multilingual contexts. Developing a robust script-identifier in complex environments is challenging due to prevalence of mobiles and digitized documents. This paper presents a novel multi-scale image classification framework named as MuSIC, for identifying scripts in documents, scenes, and video texts. At first, multiple CNNs simultaneously process scaled maps of input image to produce scale-wise predictions. Weight computation module assigns unique weight to scaled map by measuring the deviation in number of pixels of the object area compared to original image. In weight-aware decision mechanism, the bag-of-prediction scores for possible output classes are updated by aggregating scale weights when CNN prediction matches the class. Finally, class with the highest score is selected as the final output class for the script of the image. Key features of MuSIC include scale-wise weight computation followed by weight-aware decision mechanism, resulting in accurate outcomes than conventional majority voting in multi-scale image classification. MuSIC is evaluated on three public datasets: AUTNT(s), CVSI-2015, and ICDAR 2019-MLT, which includes Indic, non-Indic, East Asian, and Indian regional scripts across documents, scenes, and videos. The model achieves classification accuracies of 98.28%, 96.18%, and 98.03% for three subsets of AUTNT viz. AUTNT-document, AUTNT-scene and AUTNT-mixed, besides, 95.92% and 93.83% for CVSI-2015 and ICDAR 2019-MLT, respectively. Results demonstrate robustness of MuSIC across multiple assessments. The source code, usage guidelines, and initial benchmark performance of MuSIC are available at https://github.com/iilabau/MuSIC for academic, research and non-commercial purposes.
引用
收藏
页码:166955 / 166976
页数:22
相关论文
共 50 条
  • [1] A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection
    Tauseef Khan
    Ayatullah Faruk Mollah
    Neural Processing Letters, 2022, 54 : 1371 - 1397
  • [2] A Novel Multi-scale Deep Neural Framework for Script Invariant Text Detection
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1371 - 1397
  • [3] PulseID: Multi-scale photoplethysmographic identification using a deep convolutional neural network
    Wei, Riling
    Xu, Xiaogang
    Li, Yue
    Zhang, Yiyi
    Wang, Jun
    Chen, Hanjie
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88
  • [4] Multi-scale wavelet texture-based script identification method
    Zeng, Li
    Tang, Yuanyan
    Chen, Tinghuai
    Jisuanji Xuebao/Chinese Journal of Computers, 2000, 23 (07): : 699 - 704
  • [5] A Multi-scale Triplet Deep Convolutional Neural Network for Person Re-identification
    Xiong, Mingfu
    Chen, Jun
    Wang, Zhongyuan
    Liang, Chao
    Lei, Bohan
    Hu, Ruimin
    IMAGE AND VIDEO TECHNOLOGY (PSIVT 2017), 2018, 10799 : 30 - 41
  • [6] A novel diagnostic framework based on vibration image encoding and multi-scale neural network
    Guan, Yang
    Meng, Zong
    Li, Jimeng
    Cao, Wei
    Sun, Dengyun
    Liu, Jingbo
    Fan, Fengjie
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [7] A Customized Multi-Scale Deep Learning Framework for Storm Nowcasting
    Yang, Shangshang
    Yuan, Huiling
    GEOPHYSICAL RESEARCH LETTERS, 2023, 50 (13)
  • [8] Adaptive multi-scale Graph Neural Architecture Search framework
    Yang, Lintao
    Lio, Pietro
    Shen, Xu
    Zhang, Yuyang
    Peng, Chengbin
    NEUROCOMPUTING, 2024, 599
  • [9] Residual attention-based multi-scale script identification in scene text images
    Ma, Mengkai
    Wang, Qiu-Feng
    Huang, Shan
    Huang, Shen
    Goulermas, Yannis
    Huang, Kaizhu
    NEUROCOMPUTING, 2021, 421 : 222 - 233
  • [10] Residual attention-based multi-scale script identification in scene text images
    Ma M.
    Wang Q.-F.
    Huang S.
    Huang S.
    Goulermas Y.
    Huang K.
    Neurocomputing, 2021, 421 : 222 - 233