ICDAR2017 Competition on Post-OCR Text Correction

被引:20
|
作者
Chiron, Guillaume [1 ]
Doucet, Antoine [2 ]
Coustaty, Mickael [2 ]
Moreux, Jean-Philippe [1 ]
机构
[1] Natl Lib France, F-75706 Paris, France
[2] Univ La Rochelle, Lab L3i, Av Michel Crepeau, F-17000 La Rochelle, France
关键词
D O I
10.1109/ICDAR.2017.232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the ICDAR2017 competition on post-OCR text correction and presents the different methods submitted by the participants. OCR has been an active research field for over the past 30 years but results are still imperfect, especially for historical documents. The purpose of this competition is to compare and evaluate automatic approaches for correcting (denoising) OCR-ed texts. The challenge consists of two independent tasks: 1) error detection and 2) error correction. An original dataset of 12M OCR-ed symbols along with an aligned ground truth was provided to the participants with 80% of the dataset dedicated to the training and 20% to the evaluation. Different sources were aggregated and namely contain newspapers and monographs covering 2 languages (English and French). 11 teams submitted results, while the difficulty of the task was underlined by the fact that only half of the submitted methods were able to denoise the evaluation dataset on average. In any case, this competition, which counted 35 registrations, illustrates the strong interest of the community in this essential problem, which is key to any digitization process involving textual data.
引用
收藏
页码:1423 / 1428
页数:6
相关论文
共 50 条
  • [1] ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset
    Andreu Sanchez, Joan
    Romero, Veronica
    Toselli, Alejandro H.
    Villegas, Mauricio
    Vidal, Enrique
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1383 - 1388
  • [2] Text Detection and Post-OCR Correction in Engineering Documents
    Francois, Mathieu
    Eglin, Veronique
    Biou, Maxime
    [J]. DOCUMENT ANALYSIS SYSTEMS, DAS 2022, 2022, 13237 : 726 - 740
  • [3] ICDAR2017 Competition on Page Object Detection
    Gao, Liangcai
    Yi, Xiaohan
    Jiang, Zhuoren
    Hao, Leipeng
    Tang, Zhi
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1417 - 1422
  • [4] cBAD: ICDAR2017 Competition on Baseline Detection
    Diem, Markus
    Kleber, Florian
    Fiel, Stefan
    Gatos, Basilis
    Gruening, Tobias
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1355 - 1360
  • [5] ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
    Shi, Baoguang
    Yao, Cong
    Liao, Minghui
    Yang, Mingkun
    Xu, Pei
    Cui, Linyan
    Belongie, Serge
    Lu, Shijian
    Bai, Xiang
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1429 - 1434
  • [6] ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
    Shi, Baoguang
    Yao, Cong
    Liao, Minghui
    Yang, Mingkun
    Xu, Pei
    Cui, Linyan
    Belongie, Serge
    Lu, Shijian
    Bai, Xiang
    [J]. Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, 2017, 1 : 1429 - 1434
  • [7] ICDAR2017 Competition on Document Image Binarization (DIBCO 2017)
    Pratikakis, Ioannis
    Zagoris, Konstantinos
    Barlas, George
    Gatos, Basilis
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1395 - 1403
  • [8] ICDAR2017 Competition on Arabic Text Detection and Recognition in Multi-resolution Video Frames
    Zayene, Oussama
    Hennebert, Jean
    Ingold, Rolf
    BenAmara, Najoua Essoukri
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1460 - 1465
  • [9] ICDAR2017 Competition on Information Extraction in Historical Handwritten Records
    Fornes, Alicia
    Romero, Veronica
    Baro, Arnau
    Ignacio Toledo, Juan
    Andreu Sanchez, Joan
    Vidal, Enrique
    Llados, Josep
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1389 - 1394
  • [10] ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts
    Simistira, Fotini
    Bouillon, Manuel
    Seuret, Mathias
    Wuersch, Marcel
    Alberti, Michele
    Ingold, Rolf
    Liwicki, Marcus
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1361 - 1370