anyOCR: An Open-Source OCR System for Historical Archives

被引:18
|
作者
Bukhari, Syed Saqib [1 ]
Kadi, Ahmad [1 ]
Jouneh, Mohammad Ayman [1 ]
Mir, Fahim Mahmood [1 ]
Dengel, Andreas [1 ]
机构
[1] Univ Kaiserslautern, German Res Ctr Artificial Intelligence DFKI, Kaiserslautern, Germany
关键词
OCR System; Historical Archives; End-To-End Document Image Processing;
D O I
10.1109/ICDAR.2017.58
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently an intensive amount of research is going on in the field of digitizing historical archives for converting scanned document images into searchable full text. This paper presents the "anyOCR" system which mainly emphasize the techniques requires for digitizing a historical archive with high accuracy. It is an open-source system for the research community who can easily apply the anyOCR system for digitizing historical archives. The anyOCR system supports a complete document processing pipeline, which includes layout analysis, training OCR models and text line prediction, with an addition of intelligent and interactive layout and OCR error corrections web applications. The anyOCR system can also be used for contemporary document images containing diverse, simple to complex, layouts. This paper describes the current state of the anyOCR system, its architecture, as well as its major features. This paper also provides information about the availability, documentation, and tutorials of the anyOCR system.
引用
收藏
页码:305 / 310
页数:6
相关论文
共 50 条