Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

被引:0
|
作者
Junczys-Dowmunt, Marcin [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong sentence-level baselines, trained on large-scale data created via data-filtering and noisy back-translation and find that back-translation seems to mainly help with translationese input. We explore fine-tuning techniques, deeper models and different ensembling strategies to counter these effects. Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models. We experiment with data augmentation techniques for the smaller authentic data with document-boundaries and for larger authentic data without boundaries. We further explore multi-task training for the incorporation of document-level source language monolingual data via the BERT-objective on the encoder and two-pass decoding for combinations of sentence-level and document-level systems. Based on preliminary human evaluation results, evaluators strongly prefer the document-level systems over our comparable sentence-level system. The document-level systems also seem to score higher than the human references in source-based direct assessment.
引用
收藏
页码:225 / 233
页数:9
相关论文
共 50 条
  • [1] Rethinking Document-level Neural Machine Translation
    Sun, Zewei
    Wang, Mingxuan
    Zhou, Hao
    Zhao, Chengqi
    Huang, Shujian
    Chen, Jiajun
    Li, Lei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3537 - 3548
  • [2] Document-Level Adaptation for Neural Machine Translation
    Kothur, Sachith Sri Ram
    Knowles, Rebecca
    Koehn, Philipp
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 64 - 73
  • [3] Corpora for Document-Level Neural Machine Translation
    Liu, Siyou
    Zhang, Xiaojun
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3775 - 3781
  • [4] Combining Local and Document-Level Context: The LMU Munich Neural Machine Translation System at WMT19
    Stojanovski, Dario
    Fraser, Alexander
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 400 - 406
  • [5] On Search Strategies for Document-Level Neural Machine Translation
    Herold, Christian
    Ney, Hermann
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 12827 - 12836
  • [6] Scaling Law for Document-Level Neural Machine Translation
    Zhang, Zhuocheng
    Gu, Shuhao
    Zhang, Min
    Feng, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8290 - 8303
  • [7] Towards Personalised and Document-level Machine Translation of Dialogue
    Vincent, Sebastian T.
    EACL 2021: THE 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2021, : 137 - 147
  • [8] Document-Level Machine Translation with Large Language Models
    Wang, Longyue
    Lyu, Chenyang
    Ji, Tianbo
    Zhang, Zhirui
    Yu, Dian
    Shi, Shuming
    Tu, Zhaopeng
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16646 - 16661
  • [9] Exploring Paracrawl for Document-level Neural Machine Translation
    Al Ghussin, Yusser
    Zhang, Jingyi
    van Genabith, Josef
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1304 - 1310
  • [10] Encouraging Lexical Translation Consistency for Document-Level Neural Machine Translation
    Lyu, Xinglin
    Li, Junhui
    Gong, Zhengxian
    Zhang, Min
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3265 - 3277