Script Identification for Printed and Handwritten Indian Documents: An Empirical Study of Different Feature Classifier Combinations

被引:5
|
作者
Rani, Rajneesh [1 ]
Dhir, Renu [1 ]
Kakkar, Deepti [2 ]
Sharma, Nonita [1 ]
机构
[1] Dr BR Ambedkar Natl Inst Technol, Dept Comp Sci & Engn, Jalandhar 144011, Punjab, India
[2] Dr BR Ambedkar Natl Inst Technol, Dept Elect & Commun Engn, Jalandhar 144011, Punjab, India
关键词
Script identification; page level; texture features; machine learning; Gabor; wavelet; INVARIANT TEXTURE FEATURES; ROTATION;
D O I
10.1142/S0219467821400118
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The identification of script in a document page image is the first step for an OCR system processing multi-script documents. In this multilingual/multiscript world, document processing systems relying on the OCR that need human involvement to select the appropriate OCR package is definitely undesirable and inefficient. The development of robust and efficient methods for automatic script identification of a document is a subject of major importance for automatic document processing in a multilingual/multiscript environment. Thus, the basic objective is to come up with some intuitive methods having straightforward implementation without compromising with efficiency. The aim of this work is to evaluate state-of-the-art feature extraction and classification techniques in the field of automatic script identification of printed and handwritten documents and to propose the best combination for the same.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [41] HMM Based Keyword Spotting System in Printed/Handwritten Arabic/Latin Documents with Identification Stage
    Rouhou, Ahmed Cheikh
    Kessentini, Yousri
    Kanoun, Slim
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2019, PT I, 2019, 11662 : 309 - 320
  • [42] Automatic Indic script identification from handwritten documents: page, block, line and word-level approach
    Obaidullah, Sk Md
    Santosh, K. C.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (01) : 87 - 106
  • [43] Automatic Indic script identification from handwritten documents: page, block, line and word-level approach
    Sk Md Obaidullah
    K. C. Santosh
    Chayan Halder
    Nibaran Das
    Kaushik Roy
    International Journal of Machine Learning and Cybernetics, 2019, 10 : 87 - 106
  • [44] Neural network based word-wise handwritten script identification system for Indian postal automation
    Roy, K
    Pal, U
    Chaudhuri, BB
    2005 INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSING, PROCEEDINGS, 2005, : 240 - 245
  • [45] A Hybrid Gini PSO-SVM Feature Selection: An Empirical Study of Population Sizes on Different Classifier
    Allias, Noormadinah
    Noor, Megat NorulAzmi Megat Mohamed
    Ismail, Mohd. Nazri
    de Silva, Kim
    2013 FIRST INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, MODELLING AND SIMULATION (AIMS 2013), 2013, : 107 - 110
  • [46] An OCR Free Method for Word Spotting in Printed Documents: the Evaluation of Different Feature Sets
    Rios, Israel
    Britto, Alceu de Souza, Jr.
    Koerich, Alessandro Lameiras
    Soares Oliveira, Luis Eduardo
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2011, 17 (01) : 48 - 63
  • [47] Visual Analytic-Based Technique for Handwritten Indic Script Identification-A Greedy Heuristic Feature Fusion Framework
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2015, 2016, 404 : 211 - 219
  • [48] Identification of Indian Musical Instruments by Feature Analysis with Different Classifiers
    Joshi, Swarupa
    Chitre, Abhijit
    6TH INTERNATIONAL CONFERENCE ON COMPUTER & COMMUNICATION TECHNOLOGY (ICCCT-2015), 2015, : 110 - 114
  • [49] An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents
    Raheel, Saeed
    Dichy, Joseph
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2010, 6008 : 673 - 686
  • [50] AUTOMATIC LINE-LEVEL SCRIPT IDENTIFICATION FROM HANDWRITTEN DOCUMENT IMAGES - A REGION-WISE CLASSIFICATION FRAMEWORK FOR INDIAN SUBCONTINENT
    Obaidullah, Sk Md
    Halder, Chayan
    Santosh, K. C.
    Das, Nibaran
    Roy, Kaushik
    MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2018, 31 (01) : 63 - 84