Open Source Historical OCR: The OCRopodium Project

被引:0
|
作者
Bryant, Michael [1 ]
Blanke, Tobias [1 ]
Hedges, Mark [1 ]
Palmer, Richard [1 ]
机构
[1] Kings Coll London, Ctr E Res, London, England
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we present some initial results of OCRopodium project to build a scalable workflow for OCR of historical collections. Large-scale digitisation projects dealing with text-based historical material face challenges that are not well-catered-to by commercial software. Open source tools allow for better customisation to match these requirements, particularly with regard to character model training and per-project language modelling.
引用
收藏
页码:522 / 525
页数:4
相关论文
共 50 条
  • [1] Ocropodium: open source OCR for small-scale historical archives
    Blanke, Tobias
    Bryant, Michael
    Hedges, Mark
    [J]. JOURNAL OF INFORMATION SCIENCE, 2012, 38 (01) : 76 - 86
  • [2] anyOCR: An Open-Source OCR System for Historical Archives
    Bukhari, Syed Saqib
    Kadi, Ahmad
    Jouneh, Mohammad Ayman
    Mir, Fahim Mahmood
    Dengel, Andreas
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 305 - 310
  • [3] The OCRopus open source OCR system
    Breuel, Thomas M.
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XV, 2008, 6815
  • [4] OCR4all-An Open-Source Tool Providing a (Semi-)Automatic OCR Workflow for Historical Printings
    Reul, Christian
    Christ, Dennis
    Hartelt, Alexander
    Balbach, Nico
    Wehner, Maximilian
    Springmann, Uwe
    Wick, Christoph
    Grundig, Christine
    Buettner, Andreas
    Puppe, Frank
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (22):
  • [5] Open source OCR framework using mobile devices
    Zhou, Steven Zhiying
    Gilani, Syed Omer
    Winkler, Stefan
    [J]. MULTIMEDIA ON MOBILE DEVICES 2008, 2008, 6821
  • [6] Open-source OCR Engine Integration with Greek Dictionary
    Alkiviadis, Tsimpiris
    Varsamis, Dimitrios
    Strouthopoulos, Charalampos
    Pavlidis, George
    Chairi, Kiourt
    [J]. 25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 436 - 441
  • [7] okralact - a multi-engine Open Source OCR training system
    Baierer, Konstantin
    Dong, Rui
    Neudecker, Clemens
    [J]. PROCEEDINGS OF THE 2019 WORKSHOP ON HISTORICAL DOCUMENT IMAGING AND PROCESSING (HIP' 19), 2019, : 25 - 30
  • [8] Determining Open Source Project Boundaries
    Vargas, Sophia
    [J]. 2023 IEEE/ACM 20TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2023, : 516 - 517
  • [9] Comparison of Stabilities for Open Source Project
    Sone, Hironobu
    Tamura, Yoshinobu
    Yamada, Shigeru
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEE IEEM21), 2021, : 933 - 936
  • [10] Towards the UAL open source project
    Malitsky, N
    Talman, R
    Blaskiewicz, M
    Calaga, R
    Fliller, R
    Luccio, A
    Satogata, T
    Wei, J
    [J]. PROCEEDINGS OF THE 2003 PARTICLE ACCELERATOR CONFERENCE, VOLS 1-5, 2003, : 272 - 274