Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents

被引:0
|
作者
Obaidullah, Sk Md [1 ]
Santosh, K. C. [2 ]
Halder, Chayan [3 ]
Das, Nibaran [4 ]
Roy, Kaushik [3 ]
机构
[1] Aliah Univ Kolkata, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[4] West Bengal State Univ, Dept Comp Sci, Kolkata, India
关键词
Multi-script documents; Official indic script database; Script identification;
D O I
10.1007/978-981-10-4859-3_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Without a publicly available database, we cannot advance research nor can we make a fair comparison with the state-of-the-art methods. To bridge this gap, we present a database of eleven Indic scripts from thirteen official languages for the purpose of script identification in multi-script document images. Our database is composed of 39K words that are equally distributed (i.e., 3K words per language). At the same time, we also study three different pertinent features: spatial energy (SE), wavelet energy (WE) and the Radon transform (RT), including their possible combinations, by using three different classifiers: multilayer perceptron (MLP), fuzzy unordered rule induction algorithm (FURIA) and random forest (RF). In our test, using all features, MLP is found to be the best performer showing the bi-script accuracy of 99.24% (keeping Roman common), 98.38% (keeping Devanagari common) and tri-script accuracy of 98.19% (keeping both Devanagari and Roman common).
引用
收藏
页码:16 / 27
页数:12
相关论文
共 50 条
  • [31] Multi-script Identification from Printed Words
    Jetley, Saumya
    Mehrotra, Kapil
    Vaze, Atish
    Belhe, Swapnil
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2014, PT I, 2014, 8814 : 359 - 368
  • [32] Multi-script Writer Identification using Dissimilarity
    Bertolini, Diego
    Oliveira, Luiz S.
    Sabourin, Robert
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3025 - 3030
  • [33] Understanding NFC-Net: a deep learning approach to word-level handwritten Indic script recognition
    Kundu, Soumyadeep
    Paul, Sayantan
    Singh, Pawan Kumar
    Sarkar, Ram
    Nasipuri, Mita
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (12): : 7879 - 7895
  • [34] Offline Script Identification from multilingual Indic-script documents: A state-of-the-art
    Singh, Pawan Kumar
    Sarkar, Ram
    Nasipuri, Mita
    COMPUTER SCIENCE REVIEW, 2015, 15-16 : 1 - 28
  • [35] Understanding NFC-Net: a deep learning approach to word-level handwritten Indic script recognition
    Soumyadeep Kundu
    Sayantan Paul
    Pawan Kumar Singh
    Ram Sarkar
    Mita Nasipuri
    Neural Computing and Applications, 2020, 32 : 7879 - 7895
  • [36] Artistic multi-script identification at character level with extreme learning machine
    Ghosh, Mridul
    Mukherjee, Himadri
    Obaidullah, Sk Md
    Santosh, K. C.
    Das, Nibaran
    Roy, Kaushik
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 496 - 505
  • [37] Multi-script Writer Identification Optimized With Retrieval Mechanism
    Djeddi, Chawki
    Siddiqi, Imran
    Souici-Meslati, Labiba
    Ennaji, Abdellatif
    13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 509 - 514
  • [38] Script Identification from Camera-Captured Multi-script Scene Text Components
    Jajoo, Madhuram
    Chakraborty, Neelotpal
    Mollah, Ayatullah Faruk
    Basu, Subhadip
    Sarkar, Ram
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 159 - 166
  • [39] Few-shot learning for word-level scene text script identification
    Naosekpam, Veronica
    Sahu, Nilkanta
    COMPUTATIONAL INTELLIGENCE, 2024, 40 (01)
  • [40] Word level script identification in bilingual documents through discriminating features
    Dhandra, B. V.
    Hangarge, Mallikarjun
    Hegadi, Ravindra
    Malemath, V. S.
    2007 INTERNATIONAL CONFERENCE OF SIGNAL PROCESSING, COMMUNICATIONS AND NETWORKING, VOLS 1 AND 2, 2006, : 630 - +