Language Identification for Interactive Handwriting Transcription of Multilingual Documents

被引:0
|
作者
del Agua, Miguel A. [1 ]
Serrano, Nicolas [1 ]
Juan, Alfons [1 ]
机构
[1] Univ Politecn Valencia, DSIC ITI, Valencia, Spain
关键词
Language Identification; Interactive Handwriting Transcription; Multilingual Documents;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An effective approach to handwriting transcription of (old) documents is to follow a sequential, line-by-line transcription of the whole document, in which a continuously retrained system interacts with the user. In the case of multilingual documents, however, a minor yet important issue for this interactive approach is to first identify the language of the current text line image to be transcribed. In this paper, we propose a probabilistic framework and three techniques for this purpose. Empirical results are reported on an entire 764-page multilingual document for which previous empirical tests were limited to its first 180 pages, written only in Spanish.
引用
收藏
页码:596 / 603
页数:8
相关论文
共 50 条
  • [1] Linguini: Language identification for multilingual documents
    IBM Thomas J. Watson Research Center, United States
    不详
    不详
    J Manage Inf Syst, 3 (71-101):
  • [2] Linguini: Language identification for multilingual documents
    Prager, JM
    JOURNAL OF MANAGEMENT INFORMATION SYSTEMS, 1999, 16 (03) : 71 - 101
  • [3] Language Set Identification in Noisy Synthetic Multilingual Documents
    Jauhiainen, Tommi
    Linden, Krister
    Jauhiainen, Heidi
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 633 - 643
  • [4] Automatic handwriting identification on medieval documents
    Bulacu, Marius
    Schomaker, Lambert
    14TH INTERNATIONAL CONFERENCE ON IMAGE ANALYSIS AND PROCESSING, PROCEEDINGS, 2007, : 279 - +
  • [5] Transparency with Second Language and Multilingual Transcription
    Seibert, Andrew Douglas
    TESOL QUARTERLY, 2022, 56 (02) : 499 - 524
  • [6] Multilingual native language identification
    Malmasi, Shervin
    Dras, Mark
    NATURAL LANGUAGE ENGINEERING, 2017, 23 (02) : 163 - 215
  • [7] Handwriting Transcription and Keyword Spotting in Historical Daily Records Documents
    Romero, Veronica
    Toselli, Alejandro H.
    Andreu Sanchez, Joan
    Vidal, Enrique
    PROCEEDINGS OF 12TH IAPR WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS, (DAS 2016), 2016, : 275 - 280
  • [8] Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
    Valente, Martina
    Brugnara, Fabio
    Morrone, Giovanni
    Zovato, Enrico
    Badino, Leonardo
    INTERSPEECH 2024, 2024, : 1645 - 1649
  • [9] IMPROVING LANGUAGE IDENTIFICATION FOR MULTILINGUAL SPEAKERS
    Titus, Andrew
    Silovsky, Jan
    Chen, Nanxin
    Hsiao, Roger
    Young, Mary
    Ghoshal, Arnab
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8284 - 8288
  • [10] Automatic Language Identification and Content Separation from Indian Multilingual Documents Using Unicode Transformation Format
    Rakholia, Rajnish M.
    Saini, Jatinderkumar R.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 1, 2017, 468 : 369 - 378