anyOCR: An Open-Source OCR System for Historical Archives

被引:18
|
作者
Bukhari, Syed Saqib [1 ]
Kadi, Ahmad [1 ]
Jouneh, Mohammad Ayman [1 ]
Mir, Fahim Mahmood [1 ]
Dengel, Andreas [1 ]
机构
[1] Univ Kaiserslautern, German Res Ctr Artificial Intelligence DFKI, Kaiserslautern, Germany
关键词
OCR System; Historical Archives; End-To-End Document Image Processing;
D O I
10.1109/ICDAR.2017.58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently an intensive amount of research is going on in the field of digitizing historical archives for converting scanned document images into searchable full text. This paper presents the "anyOCR" system which mainly emphasize the techniques requires for digitizing a historical archive with high accuracy. It is an open-source system for the research community who can easily apply the anyOCR system for digitizing historical archives. The anyOCR system supports a complete document processing pipeline, which includes layout analysis, training OCR models and text line prediction, with an addition of intelligent and interactive layout and OCR error corrections web applications. The anyOCR system can also be used for contemporary document images containing diverse, simple to complex, layouts. This paper describes the current state of the anyOCR system, its architecture, as well as its major features. This paper also provides information about the availability, documentation, and tutorials of the anyOCR system.
引用
收藏
页码:305 / 310
页数:6
相关论文
共 50 条
  • [1] Ocropodium: open source OCR for small-scale historical archives
    Blanke, Tobias
    Bryant, Michael
    Hedges, Mark
    [J]. JOURNAL OF INFORMATION SCIENCE, 2012, 38 (01) : 76 - 86
  • [2] anyOCR: A Sequence Learning Based OCR System for Unlabeled Historical Documents
    Jenckel, Martin
    Bukhari, Syed Saqib
    Dengel, Andreas
    [J]. 2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 4035 - 4040
  • [3] OCR4all-An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings
    Reul, Christian
    Christ, Dennis
    Hartelt, Alexander
    Balbach, Nico
    Wehner, Maximilian
    Springmann, Uwe
    Wick, Christoph
    Grundig, Christine
    Buettner, Andreas
    Puppe, Frank
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (22):
  • [4] Open Source Historical OCR: The OCRopodium Project
    Bryant, Michael
    Blanke, Tobias
    Hedges, Mark
    Palmer, Richard
    [J]. RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2010, 6273 : 522 - 525
  • [5] Open-source OCR Engine Integration with Greek Dictionary
    Alkiviadis, Tsimpiris
    Varsamis, Dimitrios
    Strouthopoulos, Charalampos
    Pavlidis, George
    Chairi, Kiourt
    [J]. 25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 436 - 441
  • [6] The OCRopus open source OCR system
    Breuel, Thomas M.
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [7] An open-source database model and collections management system for fish scale and otolith archives
    Tray, Elizabeth
    Leadbetter, Adam
    Meaney, Will
    Conway, Andrew
    Kelly, Caoimhin
    Maoileidigh, Niall O.
    de Eyto, Elvira
    Moran, Siobhan
    Brophy, Deirdre
    [J]. ECOLOGICAL INFORMATICS, 2020, 59
  • [8] AN OPEN-SOURCE ARCHIVING SYSTEM
    Rappaport, T. S.
    Murdock, J. N.
    Michelson, D. G.
    Shapiro, R.
    [J]. IEEE VEHICULAR TECHNOLOGY MAGAZINE, 2011, 6 (02): : 24 - 32
  • [9] Electronic publishing and institutional archives: utilising open-source software
    Royneberg, Ellen
    [J]. BID-TEXTOS UNIVERSITARIS DE BIBLIOTECONOMIA I DOCUMENTACIO, 2007, (19):
  • [10] Open-source control system alternatives
    Verhappen, Ian
    [J]. Control, 2019, 32 (09):