Information Extraction from Handwritten Tables in Historical Documents

被引:7
|
作者
Andres, Jose [1 ]
Ramon Prieto, Jose [1 ]
Granell, Emilio [1 ]
Romero, Veronica [2 ]
Andreu Sanchez, Joan [1 ]
Vidal, Enrique [1 ]
机构
[1] Univ Politecn Valencia, PRHLT Res Ctr, Valencia, Spain
[2] Univ Valencia, Dept Informat, Valencia, Spain
来源
关键词
Structured handwritten documents; Information extraction; Neural networks; DATABASE;
D O I
10.1007/978-3-031-06555-2_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, significant advances have been made in Document Understanding in structured historical documents. However, not much research has been done in information extraction from handwritten structured historical documents. In this paper, we compare two Machine Learning approaches and another approach that is based on heuristic rules to extract information in historical pre-printed forms with handwritten information. We analyze how each approach performs at each step of the extraction process. The proposed approaches improve the heuristic-rule baseline by up to 0.14 F-measure points throughout the information extraction pipeline.
引用
收藏
页码:184 / 198
页数:15
相关论文
共 50 条
  • [1] Handwritten information extraction from historical census documents
    Nion, Thibauld
    Menasri, Fares
    Louradour, Jerome
    Sibade, Cedric
    Retornaz, Thomas
    Metaireau, Pierre-Yves
    Kermorvant, Christopher
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 822 - 826
  • [2] Text Line Extraction in Handwritten Historical Documents
    Capobianco, Samuele
    Marinai, Simone
    [J]. DIGITAL LIBRARIES AND ARCHIVES, IRCDL 2017, 2017, 733 : 68 - 79
  • [3] Extraction of handwritten information in geometrically distorted documents
    Safari, R
    Narasimhamurthi, N
    Shridhar, M
    Ahmadi, M
    [J]. FOURTEENTH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1 AND 2, 1998, : 1298 - 1300
  • [4] Information extraction in handwritten historical logbooks
    Ramon Prieto, Jose
    Andres, Jose
    Granell, Emilio
    Sanchez, Joan Andreu
    Vidal, Enrique
    [J]. PATTERN RECOGNITION LETTERS, 2023, 172 : 128 - 136
  • [5] Clustering Web Documents with Tables for Information Extraction
    Shchekotykhin, Kostyantyn
    Jannach, Dietmar
    Friedrich, Gerhard
    [J]. K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 169 - 170
  • [6] Lanna Handwritten Character Recognition on Historical Documents Using Feature Extraction
    Khankasikam, Krisda
    [J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 2553 - 2560
  • [7] Preserving Text Content from Historical Handwritten Documents
    Chakraborty, Arpita
    Blumenstein, Michael
    [J]. PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 329 - 334
  • [8] Recognition and Information Extraction in Historical Handwritten Tables: Toward Understanding Early 20th Century Paris Census
    Constum, Thomas
    Kempf, Nicolas
    Paquet, Thierry
    Tranouez, Pierrick
    Chatelain, Clement
    Bree, Sandra
    Merveille, Francois
    [J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 143 - 157
  • [9] A Thresholding Approach for Text Extraction in Handwritten Historical Documents using Adaptive Morphology
    Roy, Bishakha
    Chatterjee, Rohit Kamal
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE OF EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2014, : 198 - 203
  • [10] HTR for Greek Historical Handwritten Documents
    Tsochatzidis, Lazaros
    Symeonidis, Symeon
    Papazoglou, Alexandros
    Pratikakis, Ioannis
    [J]. JOURNAL OF IMAGING, 2021, 7 (12)