Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents

被引:0
|
作者
Obaidullah, Sk Md [1 ]
Santosh, K. C. [2 ]
Halder, Chayan [3 ]
Das, Nibaran [4 ]
Roy, Kaushik [3 ]
机构
[1] Aliah Univ Kolkata, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, India
[4] West Bengal State Univ, Dept Comp Sci, Kolkata, India
关键词
Multi-script documents; Official indic script database; Script identification;
D O I
10.1007/978-981-10-4859-3_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Without a publicly available database, we cannot advance research nor can we make a fair comparison with the state-of-the-art methods. To bridge this gap, we present a database of eleven Indic scripts from thirteen official languages for the purpose of script identification in multi-script document images. Our database is composed of 39K words that are equally distributed (i.e., 3K words per language). At the same time, we also study three different pertinent features: spatial energy (SE), wavelet energy (WE) and the Radon transform (RT), including their possible combinations, by using three different classifiers: multilayer perceptron (MLP), fuzzy unordered rule induction algorithm (FURIA) and random forest (RF). In our test, using all features, MLP is found to be the best performer showing the bi-script accuracy of 99.24% (keeping Roman common), 98.38% (keeping Devanagari common) and tri-script accuracy of 98.19% (keeping both Devanagari and Roman common).
引用
收藏
页码:16 / 27
页数:12
相关论文
共 50 条
  • [41] PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2015, 2016, 404 : 233 - 242
  • [42] Feature learning and encoding for multi-script writer identification
    Abdelillah Semma
    Yaâcoub Hannad
    Imran Siddiqi
    Said Lazrak
    Mohamed El Youssfi El Kettani
    International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 79 - 93
  • [43] Stop Word Detection in Compressed Textual Images: an Experiment on Indic Script Documents
    Garain, Utpal
    Das, Amit Kumar
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1889 - +
  • [44] Multi-Script Off-line Signature Identification
    Pal, Srikanta
    Alireza, Alaei
    Pal, Umapada
    Blumenstein, Michael
    2012 12TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2012, : 236 - 240
  • [45] Feature learning and encoding for multi-script writer identification
    Semma, Abdelillah
    Hannad, Yaacoub
    Siddiqi, Imran
    Lazrak, Said
    El Kettani, Mohamed El Youssfi
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (02) : 79 - 93
  • [46] Extreme learning machine for handwritten Indic script identification in multiscript documents
    Obaidullah, Sk. Md.
    Bose, Amitava
    Mukherjee, Himadri
    Santosh, K. C.
    Das, Nibaran
    Roy, Kaushik
    JOURNAL OF ELECTRONIC IMAGING, 2018, 27 (05)
  • [47] A generalized line segmentation method for multi-script handwritten text documents
    Rakshit, Payel
    Halder, Chayan
    Md Obaidullah, Sk
    Roy, Kaushik
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [48] ICFHR 2018 Competition on Multi-Script Writer Identification
    Djeddi, Chawki
    Al-Maadeed, Somaya
    Siddiqi, Imran
    Gattal, Abdeljalil
    He, Sheng
    Akbari, Younes
    PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 506 - 510
  • [49] LAMIS-MSHD: A Multi-Script offline Handwriting Database
    Djeddi, Chawki
    Siddiqi, Imran
    Gattal, Abdeljalil
    Chibani, Youcef
    Souici-Meslati, Labiba
    El Abed, Haikal
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 93 - 97
  • [50] Automatic separation of words in multi-lingual multi-script Indian documents
    Pal, U
    Chaudhuri, BB
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 576 - 579