MuSIC: A Novel Multi-Scale Deep Neural Framework for Script Identification in the Wild

被引:0
|
作者
Khan, Tauseef [1 ]
Saif, Md. [2 ]
Mollah, Ayatullah Faruk [2 ]
机构
[1] VIT AP Univ, Sch Comp Sci & Engn, Amaravati 522237, Andhra Prades, India
[2] Aliah Univ, Dept Comp Sci & Engn, Kolkata 700160, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Convolutional neural networks; Image segmentation; Three-dimensional displays; Recurrent neural networks; Real-time systems; Accuracy; Visualization; Text recognition; Semantics; Digital images; Script identification; MuSIC; multi-scale; convolutional neural network; multi-script; LEVEL; CLASSIFICATION; FEATURES;
D O I
10.1109/ACCESS.2024.3494023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Script identification in digital images is crucial for automated text reading in multilingual contexts. Developing a robust script-identifier in complex environments is challenging due to prevalence of mobiles and digitized documents. This paper presents a novel multi-scale image classification framework named as MuSIC, for identifying scripts in documents, scenes, and video texts. At first, multiple CNNs simultaneously process scaled maps of input image to produce scale-wise predictions. Weight computation module assigns unique weight to scaled map by measuring the deviation in number of pixels of the object area compared to original image. In weight-aware decision mechanism, the bag-of-prediction scores for possible output classes are updated by aggregating scale weights when CNN prediction matches the class. Finally, class with the highest score is selected as the final output class for the script of the image. Key features of MuSIC include scale-wise weight computation followed by weight-aware decision mechanism, resulting in accurate outcomes than conventional majority voting in multi-scale image classification. MuSIC is evaluated on three public datasets: AUTNT(s), CVSI-2015, and ICDAR 2019-MLT, which includes Indic, non-Indic, East Asian, and Indian regional scripts across documents, scenes, and videos. The model achieves classification accuracies of 98.28%, 96.18%, and 98.03% for three subsets of AUTNT viz. AUTNT-document, AUTNT-scene and AUTNT-mixed, besides, 95.92% and 93.83% for CVSI-2015 and ICDAR 2019-MLT, respectively. Results demonstrate robustness of MuSIC across multiple assessments. The source code, usage guidelines, and initial benchmark performance of MuSIC are available at https://github.com/iilabau/MuSIC for academic, research and non-commercial purposes.
引用
收藏
页码:166955 / 166976
页数:22
相关论文
共 50 条
  • [21] A novel multi-scale salient object detection framework utilizing nonlinear spiking neural P systems
    Zhou, Nan
    He, Minglong
    Peng, Hong
    Liu, Zhicai
    NEUROCOMPUTING, 2025, 634
  • [22] A novel paradigm for solving PDEs: multi-scale neural computing
    Suo, Wei
    Zhang, Weiwei
    ACTA MECHANICA SINICA, 2025, 41 (06)
  • [23] A deep multi-scale neural networks for crime hotspot mapping prediction
    Jing, Changfeng
    Lv, Xinxin
    Wang, Yi
    Qin, Mengjiao
    Jin, Shiyuan
    Wu, Sensen
    Xu, Gaoran
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2024, 109
  • [24] A multi-scale strategy for deep semantic segmentation with convolutional neural networks
    Zhao, Bonan
    Zhang, Xiaoshan
    Li, Zheng
    Hu, Xianliang
    NEUROCOMPUTING, 2019, 365 : 273 - 284
  • [25] Multi-scale Neural Style Transfer Based on Deep Semantic Matching
    Yu, Jiachen
    Jin, Li
    Chen, Jiayi
    Tian, Zhiqiang
    Lan, Xuguang
    COGNITIVE SYSTEMS AND SIGNAL PROCESSING, PT II, 2019, 1006 : 185 - 196
  • [26] A Novel Multi-scale 3D CNN with Deep Neural Network for Epileptic Seizure Detection
    Choi, Gwangho
    Park, Chulkyun
    Kim, Junkyung
    Cho, Kyoungin
    Kim, Tae-Joon
    Bae, HwangSik
    Min, Kyeongyuk
    Jung, Ki-Young
    Chong, Jongwha
    2019 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2019,
  • [27] Deep semantic space guided multi-scale neural style transfer
    Yu, Jiachen
    Jin, Li
    Chen, Jiayi
    Xiao, Youzi
    Tian, Zhiqiang
    Lan, Xuguang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (03) : 3915 - 3938
  • [28] Temporal phase unwrapping using multi-scale deep neural networks
    Yin, Wei
    Zuo, Chao
    Feng, Shijie
    Tao, Tianyang
    Chen, Qian
    OPTICAL METROLOGY AND INSPECTION FOR INDUSTRIAL APPLICATIONS VI, 2019, 11189
  • [29] Deep semantic space guided multi-scale neural style transfer
    Jiachen Yu
    Li Jin
    Jiayi Chen
    Youzi Xiao
    Zhiqiang Tian
    Xuguang Lan
    Multimedia Tools and Applications, 2022, 81 : 3915 - 3938
  • [30] Multi-scale deep context convolutional neural networks for semantic segmentation
    Quan Zhou
    Wenbing Yang
    Guangwei Gao
    Weihua Ou
    Huimin Lu
    Jie Chen
    Longin Jan Latecki
    World Wide Web, 2019, 22 : 555 - 570