Separating Indic Scripts with 'matra'-A Precursor to Script Identification in Multi-script Documents

被引：3

作者：

Obaidullah, Sk. Md. ^{[1
]}

Goswami, Chitrita ^{[1
]}

Santosh, K. C. ^{[2
]}

Halder, Chayan ^{[3
]}

Das, Nibaran ^{[4
]}

Roy, Kaushik ^{[3
]}

机构：

[1] Aliah Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India

[2] Univ South Dakota, Dept Comp Sci, Vermillion, SD 57069 USA

[3] West Bengal State Univ, Dept Comp Sci, Kolkata, W Bengal, India

[4] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India

来源：

PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1 | 2017年 / 459卷

关键词：

Handwritten script identification; 'matra' based script; Topological feature; Fractal geometry analysis;

D O I：

10.1007/978-981-10-2104-6_19

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44% from MLP classifier.

引用

页码：205 / 214

页数：10

共 50 条

[31] Improved Shape Code Based Word Matching For Multi-script Documents
Mondal, Tanmoy
Tarafdar, Arundhati
Ragot, Nicolas
Ramel, Jean-Yves
Pal, Umapada
[J]. PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 181 - 185
[32] Font Identification - In Context of an Indic Script
Chanda, Sukalpa
Pal, Umapada
Franke, Katrin
[J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1655 - 1658
[33] Multi-script handwriting recognition with FOHDEL
Malaviya, A
Leja, C
Peters, L
[J]. 1996 BIENNIAL CONFERENCE OF THE NORTH AMERICAN FUZZY INFORMATION PROCESSING SOCIETY - NAFIPS, 1996, : 147 - 151
[34] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
Ahmed Cheikhrouhou
Yousri Kessentini
Slim Kanoun
[J]. Neural Computing and Applications, 2020, 32 : 9201 - 9215
[35] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
Cheikhrouhou, Ahmed
Kessentini, Yousri
Kanoun, Slim
[J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 9201 - 9215
[36] A novel framework for automatic sorting of postal documents with multi-script address blocks
Basu, Subhadip
Das, Nibaran
Sarkar, Ram
Kundu, Mahantapas
Nasipuri, Mita
Basu, Dipak Kumar
[J]. PATTERN RECOGNITION, 2010, 43 (10) : 3507 - 3521
[37] Multi-script bibliographic database: an Indian perspective
Chandrakar, R
[J]. ONLINE INFORMATION REVIEW, 2002, 26 (04) : 246 - 251
[38] MULTI-SCRIPT MODIFICATION OF MEDICATION LIST OF POMR
SLOCUM, H
CAPUT, WG
[J]. JOURNAL OF FAMILY PRACTICE, 1977, 5 (01): : 131 - 133
[39] Artistic multi-script identification at character level with extreme learning machine
Ghosh, Mridul
Mukherjee, Himadri
Obaidullah, Sk Md
Santosh, K. C.
Das, Nibaran
Roy, Kaushik
[J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 496 - 505
[40] Lossless compression of textual images: A study on Indic script documents
Garain, Utpal
Chakraborty, M. P.
Chanda, Bhabatosh
[J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS, 2006, : 806 - +

← 1 2 3 4 5 →