anyOCR: An Open-Source OCR System for Historical Archives

被引:18
|
作者
Bukhari, Syed Saqib [1 ]
Kadi, Ahmad [1 ]
Jouneh, Mohammad Ayman [1 ]
Mir, Fahim Mahmood [1 ]
Dengel, Andreas [1 ]
机构
[1] Univ Kaiserslautern, German Res Ctr Artificial Intelligence DFKI, Kaiserslautern, Germany
关键词
OCR System; Historical Archives; End-To-End Document Image Processing;
D O I
10.1109/ICDAR.2017.58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently an intensive amount of research is going on in the field of digitizing historical archives for converting scanned document images into searchable full text. This paper presents the "anyOCR" system which mainly emphasize the techniques requires for digitizing a historical archive with high accuracy. It is an open-source system for the research community who can easily apply the anyOCR system for digitizing historical archives. The anyOCR system supports a complete document processing pipeline, which includes layout analysis, training OCR models and text line prediction, with an addition of intelligent and interactive layout and OCR error corrections web applications. The anyOCR system can also be used for contemporary document images containing diverse, simple to complex, layouts. This paper describes the current state of the anyOCR system, its architecture, as well as its major features. This paper also provides information about the availability, documentation, and tutorials of the anyOCR system.
引用
收藏
页码:305 / 310
页数:6
相关论文
共 50 条
  • [41] An open-source multi-robot construction system
    Allwright, Michael
    Zhu, Weixu
    Dorigo, Marco
    [J]. HARDWAREX, 2019, 5
  • [42] Open-Source Photometric System for Enzymatic Nitrate Quantification
    Wittbrodt, B. T.
    Squires, D. A.
    Walbeck, J.
    Campbell, E.
    Campbell, W. H.
    Pearce, J. M.
    [J]. PLOS ONE, 2015, 10 (08):
  • [43] An open-source navigation system for micro aerial vehicles
    Ivan Dryanovski
    Roberto G. Valenti
    Jizhong Xiao
    [J]. Autonomous Robots, 2013, 34 : 177 - 188
  • [44] Open-source dubbing system with synthetic voice for MOOCs
    Despujol, Ignacio
    Turro, Carlos
    Puche, Sergio
    Busquets, Jaime
    [J]. PROCEEDINGS OF 2022 IEEE LEARNING WITH MOOCS (IEEE LWMOOCS VIII 2022): THE 4TH INDUSTRIAL REVOLUTION: FROM THE PANDEMIC TO THE REMOTE WORLD, 2022, : 207 - 210
  • [45] Acquire: an open-source comprehensive cancer biobanking system
    Dowst, Heidi
    Pew, Benjamin
    Watkins, Chris
    McOwiti, Apollo
    Barney, Jonathan
    Qu, Shijing
    Becnel, Lauren B.
    [J]. BIOINFORMATICS, 2015, 31 (10) : 1655 - 1662
  • [46] How to measure a large open-source distributed system
    Thain, Douglas
    Tannenbaum, Todd
    Livny, Miron
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2006, 18 (15): : 1989 - 2019
  • [47] An open-source navigation system for micro aerial vehicles
    Dryanovski, Ivan
    Valenti, Roberto G.
    Xiao, Jizhong
    [J]. AUTONOMOUS ROBOTS, 2013, 34 (03) : 177 - 188
  • [48] Modular design of an open-source, networked embedded system
    Bertolotti, Ivan Cibrario
    Hu, Tingting
    [J]. COMPUTER STANDARDS & INTERFACES, 2015, 37 : 41 - 52
  • [49] WantWords: An Open-source Online Reverse Dictionary System
    Qi, Fanchao
    Zhang, Lei
    Yang, Yanhui
    Liu, Zhiyuan
    Sun, Maosong
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 175 - 181
  • [50] Java']JavaDON: an open-source expert system shell
    Tomic, Bojan
    Jovanovic, Jelena
    Devedzic, Vladan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2006, 31 (03) : 595 - 606