Historical Document Processing: A Survey of Techniques, Tools, and Trends

被引:6
|
作者
Philips, James [1 ]
Tabrizi, Nasseh [1 ]
机构
[1] East Carolina Univ, Dept Comp Sci, Greenville, NC 27858 USA
基金
美国国家科学基金会;
关键词
Historical Document Processing; Archival Data; Handwriting Recognition; Optical Character Recognition; Digital Humanities;
D O I
10.5220/0010177403410349
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Historical Document Processing (HDP) is the process of digitizing written material from the past for future use by historians and other scholars. It incorporates algorithms and software tools from computer vision, document analysis and recognition, natural language processing, and machine learning to convert images of ancient manuscripts and early printed texts into a digital format usable in data mining and information retrieval systems. As libraries and other cultural heritage institutions have scanned their historical document archives, the need to transcribe the full text from these collections has become acute. Since HDP encompasses multiple sub-domains of computer science, knowledge relevant to its purpose is scattered across numerous journals and conference proceedings. This paper surveys the major phases of HDP, discussing standard algorithms, tools, and datasets and finally suggests directions for further research.
引用
收藏
页码:341 / 349
页数:9
相关论文
共 50 条
  • [1] Document Analysis Techniques for Automatic Electoral Document Processing: A Survey
    Ignacio Toledo, J.
    Cucurull, Jordi
    Puiggali, Jordi
    Fornes, Alicia
    Llados, Josep
    [J]. E-VOTING AND IDENTITY, VOTEID 2015, 2015, 9269 : 129 - 141
  • [2] Historical Document Processing
    Gatos, Basilis
    Louloudis, Georgios
    Stamatopoulos, Nikolaos
    Sfikas, Giorgos
    [J]. PROCEEDINGS OF THE 2017 ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 17), 2017, : 1 - 2
  • [3] A Survey on NLP Resources, Tools, and Techniques for Marathi Language Processing
    Lahoti, Pawan
    Mittal, Namita
    Singh, Girdhari
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (02)
  • [4] qPCR Trends,Techniques, and Tools
    Liszewski, Kathy
    [J]. GENETIC ENGINEERING & BIOTECHNOLOGY NEWS, 2010, 30 (05): : 16 - 18
  • [5] Automatic document processing: A survey
    Tang, YY
    Lee, SW
    Suen, CY
    [J]. PATTERN RECOGNITION, 1996, 29 (12) : 1931 - 1952
  • [6] A survey of historical document image datasets
    Konstantina Nikolaidou
    Mathias Seuret
    Hamam Mokayed
    Marcus Liwicki
    [J]. International Journal on Document Analysis and Recognition (IJDAR), 2022, 25 : 305 - 338
  • [7] A survey of historical document image datasets
    Nikolaidou, Konstantina
    Seuret, Mathias
    Mokayed, Hamam
    Liwicki, Marcus
    [J]. INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2022, 25 (04) : 305 - 338
  • [8] Packet processing and data plane program verification: A survey with tools, techniques, and challenges
    Akarte, Harishchandra A. A.
    Yadav, Dharmendra K. K.
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2023, 36 (14)
  • [9] Management tools and techniques: A survey
    Rigby, D
    [J]. CALIFORNIA MANAGEMENT REVIEW, 2001, 43 (02) : 139 - +
  • [10] A Survey on Document Image Binarization Techniques
    Lokhande, Supriya Sunil
    Dawande, N. A.
    [J]. 1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 742 - 746