Separating Indic Scripts with 'matra'-A Precursor to Script Identification in Multi-script Documents

被引:3
|
作者
Obaidullah, Sk. Md. [1 ]
Goswami, Chitrita [1 ]
Santosh, K. C. [2 ]
Halder, Chayan [3 ]
Das, Nibaran [4 ]
Roy, Kaushik [3 ]
机构
[1] Aliah Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA
[3] West Bengal State Univ, Dept Comp Sci, Kolkata, W Bengal, India
[4] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
关键词
Handwritten script identification; 'matra' based script; Topological feature; Fractal geometry analysis;
D O I
10.1007/978-981-10-2104-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44% from MLP classifier.
引用
收藏
页码:205 / 214
页数:10
相关论文
共 50 条
  • [31] Improved Shape Code Based Word Matching For Multi-script Documents
    Mondal, Tanmoy
    Tarafdar, Arundhati
    Ragot, Nicolas
    Ramel, Jean-Yves
    Pal, Umapada
    [J]. PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 181 - 185
  • [32] Font Identification - In Context of an Indic Script
    Chanda, Sukalpa
    Pal, Umapada
    Franke, Katrin
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1655 - 1658
  • [33] Multi-script handwriting recognition with FOHDEL
    Malaviya, A
    Leja, C
    Peters, L
    [J]. 1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 147 - 151
  • [34] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Ahmed Cheikhrouhou
    Yousri Kessentini
    Slim Kanoun
    [J]. Neural Computing and Applications, 2020, 32 : 9201 - 9215
  • [35] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Cheikhrouhou, Ahmed
    Kessentini, Yousri
    Kanoun, Slim
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 9201 - 9215
  • [36] A novel framework for automatic sorting of postal documents with multi-script address blocks
    Basu, Subhadip
    Das, Nibaran
    Sarkar, Ram
    Kundu, Mahantapas
    Nasipuri, Mita
    Basu, Dipak Kumar
    [J]. PATTERN RECOGNITION, 2010, 43 (10) : 3507 - 3521
  • [37] Multi-script bibliographic database: an Indian perspective
    Chandrakar, R
    [J]. ONLINE INFORMATION REVIEW, 2002, 26 (04) : 246 - 251
  • [38] MULTI-SCRIPT MODIFICATION OF MEDICATION LIST OF POMR
    SLOCUM, H
    CAPUT, WG
    [J]. JOURNAL OF FAMILY PRACTICE, 1977, 5 (01): : 131 - 133
  • [39] Artistic multi-script identification at character level with extreme learning machine
    Ghosh, Mridul
    Mukherjee, Himadri
    Obaidullah, Sk Md
    Santosh, K. C.
    Das, Nibaran
    Roy, Kaushik
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 496 - 505
  • [40] Lossless compression of textual images: A study on Indic script documents
    Garain, Utpal
    Chakraborty, M. P.
    Chanda, Bhabatosh
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 806 - +