Separating Indic Scripts with 'matra'-A Precursor to Script Identification in Multi-script Documents

被引:3
|
作者
Obaidullah, Sk. Md. [1 ]
Goswami, Chitrita [1 ]
Santosh, K. C. [2 ]
Halder, Chayan [3 ]
Das, Nibaran [4 ]
Roy, Kaushik [3 ]
机构
[1] Aliah Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA
[3] West Bengal State Univ, Dept Comp Sci, Kolkata, W Bengal, India
[4] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
关键词
Handwritten script identification; 'matra' based script; Topological feature; Fractal geometry analysis;
D O I
10.1007/978-981-10-2104-6_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44% from MLP classifier.
引用
收藏
页码:205 / 214
页数:10
相关论文
共 50 条
  • [1] Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents
    Obaidullah, Sk Md
    Goswami, Chitrita
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (05)
  • [2] A blind indic script recognizer for multi-script documents
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    [J]. ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, : 1248 - 1252
  • [3] Script Identification of Multi-Script Documents: A Survey
    Ubul, Kurban
    Tursun, Gulzira
    Aysa, Alimjan
    Impedovo, Donato
    Pirlo, Giuseppe
    Yibulayin, Tuergen
    [J]. IEEE ACCESS, 2017, 5 : 6546 - 6559
  • [4] Handwritten Indic Script Identification in Multi-Script Document Images: A Survey
    Obaidullah, Sk Md
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (10)
  • [5] Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents
    Obaidullah, Sk Md
    Santosh, K. C.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    [J]. RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 16 - 27
  • [6] Identification of different script lines from multi-script documents
    Pal, U
    Chaudhuri, BB
    [J]. IMAGE AND VISION COMPUTING, 2002, 20 (13-14) : 945 - 954
  • [7] HVS inspired system for script identification in Indian multi-script documents
    Pati, PB
    Ramakrishnan, AG
    [J]. DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389
  • [8] Statistical comparison of classifiers for script identification from multi-script handwritten documents
    Singh, Pawan Kumar
    Sarkar, Ram
    Das, Nibaran
    Basu, Subhadip
    Nasipuri, Mita
    [J]. INTERNATIONAL JOURNAL OF APPLIED PATTERN RECOGNITION, 2014, 1 (02) : 152 - 172
  • [9] Page-level Script Identification from Multi-script Handwritten Documents
    Singh, Pawan Kumar
    Dalal, Santu Kumar
    Sarkar, Ram
    Nasipuri, Mita
    [J]. 2015 THIRD INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION, CONTROL AND INFORMATION TECHNOLOGY (C3IT), 2015,
  • [10] Word-Level Script Identification from Handwritten Multi-script Documents
    Singh, Pawan Kumar
    Mondal, Arafat
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 551 - 558