MuSIC: A Novel Multi-Scale Deep Neural Framework for Script Identification in the Wild

被引:0
|
作者
Khan, Tauseef [1 ]
Saif, Md. [2 ]
Mollah, Ayatullah Faruk [2 ]
机构
[1] VIT AP Univ, Sch Comp Sci & Engn, Amaravati 522237, Andhra Prades, India
[2] Aliah Univ, Dept Comp Sci & Engn, Kolkata 700160, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Convolutional neural networks; Image segmentation; Three-dimensional displays; Recurrent neural networks; Real-time systems; Accuracy; Visualization; Text recognition; Semantics; Digital images; Script identification; MuSIC; multi-scale; convolutional neural network; multi-script; LEVEL; CLASSIFICATION; FEATURES;
D O I
10.1109/ACCESS.2024.3494023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Script identification in digital images is crucial for automated text reading in multilingual contexts. Developing a robust script-identifier in complex environments is challenging due to prevalence of mobiles and digitized documents. This paper presents a novel multi-scale image classification framework named as MuSIC, for identifying scripts in documents, scenes, and video texts. At first, multiple CNNs simultaneously process scaled maps of input image to produce scale-wise predictions. Weight computation module assigns unique weight to scaled map by measuring the deviation in number of pixels of the object area compared to original image. In weight-aware decision mechanism, the bag-of-prediction scores for possible output classes are updated by aggregating scale weights when CNN prediction matches the class. Finally, class with the highest score is selected as the final output class for the script of the image. Key features of MuSIC include scale-wise weight computation followed by weight-aware decision mechanism, resulting in accurate outcomes than conventional majority voting in multi-scale image classification. MuSIC is evaluated on three public datasets: AUTNT(s), CVSI-2015, and ICDAR 2019-MLT, which includes Indic, non-Indic, East Asian, and Indian regional scripts across documents, scenes, and videos. The model achieves classification accuracies of 98.28%, 96.18%, and 98.03% for three subsets of AUTNT viz. AUTNT-document, AUTNT-scene and AUTNT-mixed, besides, 95.92% and 93.83% for CVSI-2015 and ICDAR 2019-MLT, respectively. Results demonstrate robustness of MuSIC across multiple assessments. The source code, usage guidelines, and initial benchmark performance of MuSIC are available at https://github.com/iilabau/MuSIC for academic, research and non-commercial purposes.
引用
收藏
页码:166955 / 166976
页数:22
相关论文
共 50 条
  • [41] Multi-scale Deep Learning Architectures for Person Re-identification
    Qian, Xuelin
    Fu, Yanwei
    Jiang, Yu-Gang
    Xiang, Tao
    Xue, Xiangyang
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5409 - 5418
  • [42] MULTI-SCALE DEEP FEATURE FUSION FOR VEHICLE RE-IDENTIFICATION
    Cheng, Yiting
    Zhang, Chuanfa
    Gu, Kangzheng
    Qi, Lizhe
    Gan, Zhongxue
    Zhang, Wenqiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1928 - 1932
  • [43] A deep neural network based on multi-model and multi-scale for arrhythmia classification
    Jiang, Shipeng
    Li, Dong
    Zhang, Yatao
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 85
  • [44] Person Re-Identification by Deep Learning Multi-Scale Representations
    Chen, Yanbei
    Zhu, Xiatian
    Gong, Shaogang
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2590 - 2600
  • [45] A Multi-Scale Residual Neural Network for ECG Based Person Identification
    Jyotishi, Debasish
    Dandapat, Samarendra
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [46] Bayesian Multi-scale Convolutional Neural Network for Motif Occupancy Identification
    Li, Wei
    Zhao, Qingqing
    Zhang, Han
    Quan, Xiongwen
    Xu, Jing
    Yin, Yanbin
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 293 - 298
  • [47] Deep Multi-Scale Fusion Neural Network for Multi-Class Arrhythmia Detection
    Wang, Ruxin
    Fan, Jianping
    Li, Ye
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (09) : 2461 - 2472
  • [48] Attention Deep Model With Multi-Scale Deep Supervision for Person Re-Identification
    Wu, Di
    Wang, Chao
    Wu, Yong
    Wang, Qi-Cong
    Huang, De-Shuang
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2021, 5 (01): : 70 - 78
  • [49] Multi-scale Unrolled Deep Learning Framework for Accelerated Magnetic Resonance Imaging
    Nakarmi, Ukash
    Cheng, Joseph Y.
    Rios, Edgar P.
    Mardani, Morteza
    Pauly, John M.
    Ying, Leslie
    Vasanawala, Shreyas S.
    2020 IEEE 17TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2020), 2020, : 1052 - 1055
  • [50] PredGAN - a deep multi-scale video prediction framework for detecting anomalies in videos
    Jamadandi, Adarsh
    Kotturshettar, Sunidhi
    Mudenagudi, Uma
    ELEVENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING (ICVGIP 2018), 2018,