Script Identification for Printed and Handwritten Indian Documents: An Empirical Study of Different Feature Classifier Combinations

被引:5
|
作者
Rani, Rajneesh [1 ]
Dhir, Renu [1 ]
Kakkar, Deepti [2 ]
Sharma, Nonita [1 ]
机构
[1] Dr BR Ambedkar Natl Inst Technol, Dept Comp Sci & Engn, Jalandhar 144011, Punjab, India
[2] Dr BR Ambedkar Natl Inst Technol, Dept Elect & Commun Engn, Jalandhar 144011, Punjab, India
关键词
Script identification; page level; texture features; machine learning; Gabor; wavelet; INVARIANT TEXTURE FEATURES; ROTATION;
D O I
10.1142/S0219467821400118
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The identification of script in a document page image is the first step for an OCR system processing multi-script documents. In this multilingual/multiscript world, document processing systems relying on the OCR that need human involvement to select the appropriate OCR package is definitely undesirable and inefficient. The development of robust and efficient methods for automatic script identification of a document is a subject of major importance for automatic document processing in a multilingual/multiscript environment. Thus, the basic objective is to come up with some intuitive methods having straightforward implementation without compromising with efficiency. The aim of this work is to evaluate state-of-the-art feature extraction and classification techniques in the field of automatic script identification of printed and handwritten documents and to propose the best combination for the same.
引用
下载
收藏
页数:21
相关论文
共 50 条
  • [21] Script Identification using Gabor Feature and SVM Classifier
    Chaudhari, Shailesh
    Gulati, Ravi M.
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING AND VIRTUALIZATION (ICCCV) 2016, 2016, 79 : 85 - 92
  • [22] Extreme learning machine for handwritten Indic script identification in multiscript documents
    Obaidullah, Sk. Md.
    Bose, Amitava
    Mukherjee, Himadri
    Santosh, K. C.
    Das, Nibaran
    Roy, Kaushik
    JOURNAL OF ELECTRONIC IMAGING, 2018, 27 (05)
  • [23] Separating Indic Scripts with matra for Effective Handwritten Script Identification in Multi-Script Documents
    Obaidullah, Sk Md
    Goswami, Chitrita
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2017, 31 (05)
  • [24] Handwritten and Printed Word Identification Using Gray-Scale Feature Vector and Decision Tree Classifier
    Malakar, Samir
    Das, Rahul Kumar
    Sarkar, Ram
    Basu, Subhadip
    Nasipuri, Mita
    FIRST INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE: MODELING TECHNIQUES AND APPLICATIONS (CIMTA) 2013, 2013, 10 : 831 - 839
  • [25] HVS inspired system for script identification in Indian multi-script documents
    Pati, PB
    Ramakrishnan, AG
    DOCUMENT ANALYSIS SYSTEMS VII, PROCEEDINGS, 2006, 3872 : 380 - 389
  • [26] Identification of different script lines from multi-script documents
    Pal, U
    Chaudhuri, BB
    IMAGE AND VISION COMPUTING, 2002, 20 (13-14) : 945 - 954
  • [27] Neural network based system for script identification in Indian documents
    S. Basavaraj Patil
    N. V. Subbareddy
    Sadhana, 2002, 27 : 83 - 97
  • [28] Automatic feature selection with applications to script identification of degraded documents
    Ablavsky, V
    Stevens, MR
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 750 - 754
  • [29] Multi-script line identification from Indian documents
    Pal, U
    Sinha, S
    Chaudhuri, BB
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 880 - 884
  • [30] Neural network based system for script identification in Indian documents
    Patil, SB
    Subbareddy, NV
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2002, 27 (1): : 83 - 97