Segmentation-Free Streaming Machine Translation

被引:0
|
作者
Iranzo-Sanchez, Javier [1 ]
Iranzo-Sanchez, Jorge [1 ]
Gimenez, Adria [2 ]
Civera, Jorge [1 ]
Juan, Alfons [1 ]
机构
[1] Univ Politecn Valencia, VRAIN, Machine Learning & Language Proc, Valencia, Spain
[2] Univ Valencia, Dept Informat, Escola Tecn Super Engn, Valencia, Spain
关键词
D O I
10.1162/tacl_a_00691
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Streaming Machine Translation (MT) is the task of translating an unbounded input text stream in real-time. The traditional cascade approach, which combines an Automatic Speech Recognition (ASR) and an MT system, relies on an intermediate segmentation step which splits the transcription stream into sentence-like units. However, the incorporation of a hard segmentation constrains the MT system and is a source of errors. This paper proposes a Segmentation-Free framework that enables the model to translate an unsegmented source stream by delaying the segmentation decision until after the translation has been generated. Extensive experiments show how the proposed Segmentation-Free framework has better quality-latency trade-off than competing approaches that use an independent segmentation model.1
引用
收藏
页码:1104 / 1121
页数:18
相关论文
共 50 条
  • [31] Direct Segmentation Models for Streaming Speech Translation
    Iranzo-Sanchez, Javier
    Pastor, Adria Gimenez
    Silvestre-Cerda, Joan Albert
    Baquero-Arnal, Pau
    Civera, Jorge
    Juan, Alfons
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2599 - 2611
  • [32] A segmentation-free approach to text recognition with application to Arabic text
    Al-Badr B.
    Haralick R.M.
    [J]. International Journal on Document Analysis and Recognition, 1998, 1 (3) : 147 - 166
  • [33] Segmentation-free writer identification based on convolutional neural network
    Kumar, Parveen
    Sharma, Ambalika
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2020, 85
  • [34] Combining Convolutional Neural Networks and LSTMs for Segmentation-Free OCR
    Rawls, Stephen
    Cao, Huaigu
    Kumar, Senthil
    Natarajan, Prem
    [J]. 2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 155 - 160
  • [35] Segmentation-free optical character recognition for printed Urdu text
    Israr Ud Din
    Imran Siddiqi
    Shehzad Khalid
    Tahir Azam
    [J]. EURASIP Journal on Image and Video Processing, 2017
  • [36] A segmentation-free approach for keyword search in historical typewritten documents
    Gatos, B
    Konidaris, T
    Ntzios, K
    Pratikakis, I
    Perantonis, SJ
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 54 - 58
  • [37] Segmentation-free Heart Pathology Detection Using Deep Learning
    Bondareva, Erika
    Han, Jing
    Bradlow, William
    Mascolo, Cecilia
    [J]. 2021 43RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY (EMBC), 2021, : 669 - 672
  • [38] From Simultaneous to Streaming Machine Translation by Leveraging Streaming History
    Iranzo-Sanchez, Javier
    Civera, Jorge
    Juan, Alfons
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6972 - 6985
  • [39] Segmentation-free optical character recognition for printed Urdu text
    Din, Israr Ud
    Siddiqi, Imran
    Khalid, Shehzad
    Azam, Tahir
    [J]. EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2017,
  • [40] On Evaluation of Segmentation-Free Word Spotting Approaches Without Hard Decisions
    Pantke, Werner
    Maergner, Volker
    Fingscheidt, Tim
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 1300 - 1304