A Modular and Automated Annotation Platform for Handwritings: Evaluation on Under-Resourced Languages

被引:4
|
作者
Vidal-Gorene, Chahan [1 ,2 ]
Dupin, Boris [2 ]
Decours-Perez, Alienor [2 ]
Riccioli, Thomas [2 ]
机构
[1] Univ Paris, Sci & Lettres, Ecole Natl Chartes, 65 Rue Richelieu, F-75002 Paris, France
[2] MIE Bastille, Calfa, 50 Rue Tournelles, F-75003 Paris, France
关键词
HTR; OCR; Historical documents; Layout analysis; Text line extraction; Crowdsourcing; Dataset; Under-resourced language; Armenian; RECOGNITION;
D O I
10.1007/978-3-030-86334-0_33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is today several approaches for automatic handwritten document analysis. HTR achieve in particular convincing results both in layout analysis and text recognition, but also in more up-to-date requests like name entity-recognition, script identification or manuscript datation. These systems are trained and evaluated with large open and specialized databases. Manual annotation and proofreading of handwritten documents is a key step to train such systems. However, it is a time-consuming task, especially when the formats required by the systems display considerable variations, or when the interfaces do not manage several level of information. We propose a new modular and collaborative interface online, ready-to-use, for multilevel annotation and quick-view solution for handwritten and printed documents, including for right-to-left languages. This interface undertakes the creation of customized projects, and the management, the conversion and the export of data in the different formats and standards of the state-of-the-art. It includes automated tasks for layout analysis and text lines extraction with high level fine-tuning capacities. We present this new interface through the case study of the creation of a database for Armenian, an under-resourced language with specific paleographical issues.
引用
收藏
页码:507 / 522
页数:16
相关论文
共 50 条
  • [1] Eigentrigraphemes for under-resourced languages
    Ko, Tom
    Mak, Brian
    [J]. SPEECH COMMUNICATION, 2014, 56 : 132 - 141
  • [2] The Multilingual GRUG Parallel Treebank - Syntactic Annotation for Under-Resourced Languages
    Kapanadze, Oleg
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [3] The LREMap for Under-Resourced Languages
    Del Gratta, Riccardo
    Frontini, Francesca
    Khan, Anas Fahad
    Mariani, Joseph
    Soria, Claudia
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [4] Evaluation of Dictionary Creating Methods for Under-Resourced Languages
    Simon, Eszter
    Mittelholcz, Ivan
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 246 - 254
  • [5] Automatic processing of under-resourced languages
    Bernhard, Delphine
    Soria, Claudia
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2018, 59 (03): : 7 - 14
  • [6] ASR and translation for under-resourced languages
    Besacier, L.
    Le, V-B.
    Boitet, C.
    Berment, V.
    [J]. 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 6079 - 6082
  • [7] Shallow Discourse Parsing for Under-Resourced Languages: Combining Machine Translation and Annotation Projection
    Sluyter-Gaethje, Henny
    Bourgonje, Peter
    Stede, Manfred
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 1044 - 1050
  • [8] A Collection of Comparable Corpora for Under-resourced Languages
    Skadina, Inguna
    Aker, Ahmet
    Giouli, Voula
    Tufis, Dan
    Gaizauskas, Robert
    Mierina, Madara
    Mastropavlos, Nikos
    [J]. HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2010, 219 : 161 - 168
  • [9] Modeling under-resourced languages for speech recognition
    Kurimo, Mikko
    Enarvi, Seppo
    Tilk, Ottokar
    Varjokallio, Matti
    Mansikkaniemi, Andre
    Alumae, Tanel
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2017, 51 (04) : 961 - 987
  • [10] Modeling under-resourced languages for speech recognition
    Mikko Kurimo
    Seppo Enarvi
    Ottokar Tilk
    Matti Varjokallio
    André Mansikkaniemi
    Tanel Alumäe
    [J]. Language Resources and Evaluation, 2017, 51 : 961 - 987