MuSIC: A Novel Multi-Scale Deep Neural Framework for Script Identification in the Wild

被引:0
|
作者
Khan, Tauseef [1 ]
Saif, Md. [2 ]
Mollah, Ayatullah Faruk [2 ]
机构
[1] VIT AP Univ, Sch Comp Sci & Engn, Amaravati 522237, Andhra Prades, India
[2] Aliah Univ, Dept Comp Sci & Engn, Kolkata 700160, India
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Convolutional neural networks; Image segmentation; Three-dimensional displays; Recurrent neural networks; Real-time systems; Accuracy; Visualization; Text recognition; Semantics; Digital images; Script identification; MuSIC; multi-scale; convolutional neural network; multi-script; LEVEL; CLASSIFICATION; FEATURES;
D O I
10.1109/ACCESS.2024.3494023
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Script identification in digital images is crucial for automated text reading in multilingual contexts. Developing a robust script-identifier in complex environments is challenging due to prevalence of mobiles and digitized documents. This paper presents a novel multi-scale image classification framework named as MuSIC, for identifying scripts in documents, scenes, and video texts. At first, multiple CNNs simultaneously process scaled maps of input image to produce scale-wise predictions. Weight computation module assigns unique weight to scaled map by measuring the deviation in number of pixels of the object area compared to original image. In weight-aware decision mechanism, the bag-of-prediction scores for possible output classes are updated by aggregating scale weights when CNN prediction matches the class. Finally, class with the highest score is selected as the final output class for the script of the image. Key features of MuSIC include scale-wise weight computation followed by weight-aware decision mechanism, resulting in accurate outcomes than conventional majority voting in multi-scale image classification. MuSIC is evaluated on three public datasets: AUTNT(s), CVSI-2015, and ICDAR 2019-MLT, which includes Indic, non-Indic, East Asian, and Indian regional scripts across documents, scenes, and videos. The model achieves classification accuracies of 98.28%, 96.18%, and 98.03% for three subsets of AUTNT viz. AUTNT-document, AUTNT-scene and AUTNT-mixed, besides, 95.92% and 93.83% for CVSI-2015 and ICDAR 2019-MLT, respectively. Results demonstrate robustness of MuSIC across multiple assessments. The source code, usage guidelines, and initial benchmark performance of MuSIC are available at https://github.com/iilabau/MuSIC for academic, research and non-commercial purposes.
引用
收藏
页码:166955 / 166976
页数:22
相关论文
共 50 条
  • [31] Multi-scale deep context convolutional neural networks for semantic segmentation
    Zhou, Quan
    Yang, Wenbing
    Gao, Guangwei
    Ou, Weihua
    Lu, Huimin
    Chen, Jie
    Latecki, Longin Jan
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 555 - 570
  • [32] Deep Multi-scale Convolutional Neural Network for Hyperspectral Image Classification
    Zhang Feng-zhe
    Yang Xia
    NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [33] Multi-Scale Deep Neural Network for Mitosis Detection in Histological Images
    Kausar, Tasleem
    Wang, MingJiang
    Wu, Boqian
    Idrees, Muhammad
    Kanwal, Benish
    2018 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2018, : 47 - 51
  • [34] MADNN: A Multi-scale Attention Deep Neural Network for Arrhythmia Classification
    Duan, Ran
    He, Xiaodong
    Ouyang, Zhuoran
    2020 COMPUTING IN CARDIOLOGY, 2020,
  • [35] Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring
    Nah, Seungjun
    Kim, Tae Hyun
    Lee, Kyoung Mu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 257 - 265
  • [36] A Multi-scale Convolutional Neural Network Architecture for Music Auto-Tagging
    Dabral, Tanmaya Shekhar
    Deshmukh, Amala Sanjay
    Malapati, Aruna
    SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2017, VOL 1, 2019, 816 : 757 - 764
  • [37] Multi-Scale and Multi-Task Deep Learning Framework for Automatic Road Extraction
    Lu, Xiaoyan
    Zhong, Yanfei
    Zheng, Zhuo
    Liu, Yanfei
    Zhao, Ji
    Ma, Ailong
    Yang, Jie
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (11): : 9362 - 9377
  • [38] Script identification in the wild via discriminative convolutional neural network
    Shi, Baoguang
    Bai, Xiang
    Yao, Cong
    PATTERN RECOGNITION, 2016, 52 : 448 - 458
  • [39] Let's explain crisis: deep multi-scale hierarchical attention framework for crisis-task identification
    Priya, Shalini
    Joshi, Vaishali
    Chandra, Joydeep
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (12): : 17923 - 17951
  • [40] Small Sample Bearing Fault Identification Method Using Novel Multi-scale Convolutional Neural Network
    Xing Z.
    Zhao R.
    Wu Y.
    He T.
    Zhendong Ceshi Yu Zhenduan/Journal of Vibration, Measurement and Diagnosis, 2023, 43 (05): : 915 - 922