Multi-script line identification from Indian documents

被引:0
|
作者
Pal, U [1 ]
Sinha, S [1 ]
Chaudhuri, BB [1 ]
机构
[1] Indian Stat Inst, Comp Vis & Pattern Recognit Unit, Kolkata 700108, India
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A document page may contain two or more different scripts. For Optical Character Recognition (OCR) of such a document page, it is necessary to separate different scripts before feeding them to their individual OCR system. In this paper an automatic scheme is presented to identify text lines of different Indian scripts from a document. For the separation task at first the scripts are grouped into a few classes according to script characteristics. Next feature based on water reservoir principle, contour tracing, profile etc. are employed to identify them without any expensive OCR-like algorithms. At present, the system has an overall accuracy of about 97.52%.
引用
收藏
页码:880 / 884
页数:5
相关论文
共 50 条
  • [21] Multi-script Writer Identification using Dissimilarity
    Bertolini, Diego
    Oliveira, Luiz S.
    Sabourin, Robert
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3025 - 3030
  • [22] Script Identification from Camera-Captured Multi-script Scene Text Components
    Jajoo, Madhuram
    Chakraborty, Neelotpal
    Mollah, Ayatullah Faruk
    Basu, Subhadip
    Sarkar, Ram
    [J]. RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 159 - 166
  • [23] Handwritten Indic Script Identification in Multi-Script Document Images: A Survey
    Obaidullah, Sk Md
    Santosh, K. C.
    Das, Nibaran
    Halder, Chayan
    Roy, Kaushik
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, 32 (10)
  • [24] Multi-script Writer Identification Optimized With Retrieval Mechanism
    Djeddi, Chawki
    Siddiqi, Imran
    Souici-Meslati, Labiba
    Ennaji, Abdellatif
    [J]. 13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, : 509 - 514
  • [25] Feature learning and encoding for multi-script writer identification
    Abdelillah Semma
    Yaâcoub Hannad
    Imran Siddiqi
    Said Lazrak
    Mohamed El Youssfi El Kettani
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 79 - 93
  • [26] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Ahmed Cheikhrouhou
    Yousri Kessentini
    Slim Kanoun
    [J]. Neural Computing and Applications, 2020, 32 : 9201 - 9215
  • [27] Feature learning and encoding for multi-script writer identification
    Semma, Abdelillah
    Hannad, Yaacoub
    Siddiqi, Imran
    Lazrak, Said
    El Kettani, Mohamed El Youssfi
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (02) : 79 - 93
  • [28] ICFHR 2018 Competition on Multi-Script Writer Identification
    Djeddi, Chawki
    Al-Maadeed, Somaya
    Siddiqi, Imran
    Gattal, Abdeljalil
    He, Sheng
    Akbari, Younes
    [J]. PROCEEDINGS 2018 16TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2018, : 506 - 510
  • [29] Improved Shape Code Based Word Matching For Multi-script Documents
    Mondal, Tanmoy
    Tarafdar, Arundhati
    Ragot, Nicolas
    Ramel, Jean-Yves
    Pal, Umapada
    [J]. PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, 2015, : 181 - 185
  • [30] Hybrid HMM/BLSTM system for multi-script keyword spotting in printed and handwritten documents with identification stage
    Cheikhrouhou, Ahmed
    Kessentini, Yousri
    Kanoun, Slim
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13): : 9201 - 9215